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Introduction. 


The  overall  goal  of  this  proposal  is  to  develop  synthetic  lectins  (SLs)  that  bind  to  prostate  cancer  associated 
glycans  and  glycoproteins  (CAGs).  These  studies  are  being  pursued  to  develop  this  methodology  into  a  robust 
system  that  can  diagnose  and  monitor  the  stage  of  prostate  cancer.  Related  to  the  proposed  system,  aberrant 
glycosylation  is  a  hallmark  of  cancer  and,  as  such,  the  differential  display  of  boronic  acid  moieties  on  peptides 
and  peptoids  will  allow  for  monitoring  the  changes  (over-  or  neoexpression  of  CAGs)  associated  with 
oncogenesis  and  metastasis,  thereby  providing  a  new  paradigm  for  the  development  of  a  prostate  cancer 
diagnostic.  AIM  1  describes  a  library  based  approach  for  the  discovery  of  SLs  targeting  CAGs.  AIM  2 
describes  biochemical  and  biophysical  approaches  to  identify  the  factors  that  are  required  for  the  selective 
recognition  of  CAGs.  It  is  expected  that  the  results  of  these  studies  will  provide  information  that  will  allow  us 
to  inprove  the  design  of  the  libraries  described  in  AIM  I,  towards  second  and  third  generation  libraries.  In 
AIM  3,  selective  and  cross-reactive  SLs  will  be  assembled  into  an  SL-based  array.  The  efficacy  of  this  array 
win  be  evaluated  using  both  prostate  cancer  derived  CAGs  and  actual  cell  lines. 
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Body. 

Significant  progress  has  been  made.  While  continuing  to  learn  a  great  deal  about  how  our  SLs  are  binding  with 
glycan  and  glycoproteins,  we  have  also  made  great  headway  towards  assessing  the  utility  of  a  SL-Array  to 
respond  to  secreted  glycoproteins  and  human  tissue  samples.  In  consultation  with  clinical  colleagues,  we  have 
made  strides  towards  siirplifying  the  analysis  platform/method  and  in  working  with  statistical  collaborators  we 
are  continuing  to  improve  the  robustness  of  our  analysis  while  reducing  sources  of  interface  variatioa 
Specifically,  we  have  been  able  to  move  our  bead-based  readout  from  a  fluorescence  microscope  to  using  a 
standard  flow-based  system  (e.g.  fluorescence  activated  cell  sorter  -  FACS)  while  maintaining  a  significant 
portion  of  the  assay  validity.  Furthermore,  we  have  been  able  to  demonstrate  that  the  patterns  generated  by  our 
SL  Array  responding  to  cell  membrane  extracts  from  cultured  cells  mimic  those  patterns  obtained  when 
analyzing  the  culture  media  from  those  same  cell  lines,  providing  support  for  the  concept  of  creating  a  serum- 
based  diagnostic.  Similarly,  we  have  begun  to  study  glycosylation  patterns  from  human  tissue  samples  using 
our  SL  Array  and  have  obtained  excellent  discrimination  between  matched  healthy  and  cancerous  tissues.  In 
addition,  we  are  continuing  to  develop  and  improve  the  screening  methods  used  to  identify  new  SLs.  To  drive 
our  eflbrts  towards  more  biological  and  disease  relevant  models,  we  have  used  cell  membrane  extracts  rather 
than  purified  proteins  as  the  target  conponent  of  our  library  screening  method.  We  have  also  identified  a  novel 
dual-label  competitive  binding  screening  assay  that  also  relies  on  using  cell  membrane  extracts.  Because  of  our 
association  with  and  proximity  to  the  Center  for  Colon  Cancer  Research  (CCCR)  at  the  University  of  South 
Carolina  (USC),  a  great  deal  of  our  initial,  method  development  eflbrts  have  used  colon  cancer  associated  cell 
lines  and  tissue  sanples.  As  we  have  previously  demonstrated  and  is  discuss  below,  once  the  “bugs”  have  been 
worked  out  using  colon  cancer  associated 
samples,  the  transition  to  prostate  cancer  related 
samples  has  been  straightforward. 

Task  1.  Use  a  library-based  approach  to  identify 
synthetic  lectins  that  bind  to  prostate  cancer 
associated  glycans/glycoproteins  (CAGs).  Note 
that  this  aim  will  continue  over  the  life  of  the 
grant  to  continuously  identify  more  selective  and 
usefijl  SLs.  (Months  1-36) 

Initiatins  PI: 

Task  1  a):  Synthesize  bead  based  peptoid 

libraries  that  incorporate  phenylboronic  acid 

moieties.  (Months  1-4) 

Peptoid  libraries  were  constructed  using  9 
amine  building  blocks  (diversity  =  9^;  5.9  x 
10^  members)  using  the  scheme  depicted  in 
Figure  lA.  Briefly,  bromoacetic  acid  was 
coupled  using  DIG  to  Tentagel  -NH2  beads 
already  coated  with  our  MRBB  linker 
sequence.  The  beads  were  split  and  the  9 
different  amines  were  added  to  equal  amounts 
of  beads  and  reacted  in  DMF.  The  beads 
were  then  washed,  re-pooled  and  treated  with 


A. 


0%  lysat  e  0.1%  lysat  e 
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Figure  1.  Optimization  of  screening  conditions.  A. 
Scheme  for  generating  peptoid  library.  B.  Images  of 
phenylboronic  acid  (PBA)  library  with  either  0%  E.  coli 
lysate  (EL,  left)  or  0.1  %  EL  lysate  (right).  C.  Bead 
quantification  of  increasing  amounts  (0  -  5  %)  EL.  D. 
Increasing  amounts  of  NaCl  decrease  the  binding  of  the 
library  down  to  background. 
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bromoacetic  acid  and  DIC  to  couple  the  second  diversity  element.  The  Dde  protecting  group  was 
selective^  removed  using  hydrazine  to  uncover  the  primary  amine  to  be  conjugated  to  phenylboronic  acid 
(PBA).  PBA  installation  was  verified  using  ARS  and  several  beads  were  randomly  selected  for  library 
quality  evaluation. 

With  the  S5^thesized  libraries  in  hand,  we  turned  our  attention  to  identifying  ideal  screening  conditions. 
Our  goal  was  to  identify  stringent  conditions  so  we  could  identify  highly  selective  hits  firom  our  libraries. 
Based  on  previous  studies,^  we  used  E.  coli  lysates  (EL)  to  both  pre- block  the  beads  and  minimize  non¬ 
specific  interactions  during  analyte  incubation.  Figure  IB  shows  the  drastic  decrease  in  fluorescence  when 
adding  0.1%  EL  to  the  screening  buffer.  Indeed,  an  EL  gradient  (Figure  1C)  identified  0.1%  EL  as  the 
optimal  concentration  since  higher  concentrations  showed  to  strong  of  a  decrease  in  fluorescence.  We  then 
optimized  the  salt  concentrations  (Figure  ID)  and  determined  that  150  mM  NaClis  ideal 

Task  1  b):  Screen  peptoid  libraries  with  prostate  cancer  associated  glycoproteins  and  complex  glycans  to 

identify  highly  selective  and  cross-reactive  synthetic  lectin  (SL)  hits.  (Months  3-36) 

To  identify  SLs  that  are  specific  for  CAGs  (Figure  4A),  we  designed  a  screening  platform  that  used 
biotinylated  complex  carbohydrates  conjugated  to  fluorescently  labeled  streptavidin  (SA)  (Figure  4B). 
Briefly,  a  series  of  biotinylated  carbohydrates  (le.,  sialyl  Lewis  X,  sialyl  Lewis  A,  Lewis  X  and  Lewis  A) 
were  obtained  from  the  Consortium  of  Functional  Glycomics  (CFG).  Because  of  our  previous  success  with 
peptide  library  screening,  we  initially  optimized  our  screening  conditions  using  phenylboronic  acid  based 
peptide  libraries  instead  of  peptoid  based  ones  incorporating  either  the  phenylboronic  acid  or  benzoboroxole 
moieties.  For  this  assay,  we  pre-incubated  the  CFG  glycans  with  FITC- streptavidin  for  1  h  in  a  4:1  glycan- 
SA  ratio  then  added  this  complex  to  our  PBA-peptide  library  in  screening  buffer.  Using  this  method,  we 
identified  2  hits  when  screening  with  sLe’^  as  the  target  glycan.  These  hits  were  sequenced  and  had  the 
following  sequences:  see’ll  =  MRBB-LD*RFRD*L-Ac  and  sLe’‘2  =  MRBB-RD*RWVD*Y-Ac.  In 
addition  to  validating  this  screening  modality  for  identifying  both  peptide  and  peptoid  based  libraries, 
further  analyses  demonstrate  that  these  hits  bind  sialyl  Lewis  X  better 
than  Le’^  or  either  of  the  Le^  derivatives  (see  below). 

During  the  second  year  of  funding  we  continued  to  focus  on 
building  SLs  from  peptides  due  to  the  immense  success  we  have  had 
with  this  structural  motif  As  such,  we  screened  our  fixed-position- 
library  (FPL)  against  fluorescein  labeled  prostate  specific  antigen 
(FITC-PSA).  While  the  diagnostic  utility  of  PSA  has  demonstrated 
little  to  no  validity  as  a  biomarker  for  prostate  cancer,  we  chose  PSA  for 
screening  because  it  displays  many  of  the  glycans  overexpressed  in 
prostate  cancer  and  it  is  comn^rcially  available.  Briefly,  2  mg  of 
library  beads  were  washed  with  PBSG  twice  and  then  incubated  with 
1%  BSA  in  PBSG  for  15  minutes  to  reduce  nonspecific  background 
binding.  The  solution  was  removed  from  the  beads  and  0.01  mg/ml  of 
FITC-PSA  in  PBS  was  added.  The  beads  were  incubated  with  this 
solution  for  20  hours  at  room  temperature,  after  which  the  supernatant  was  removed  and  the  resin  washed 
with  PBS  three  times  before  imaging  using  fluorescence  microscopy.  Figure  2  depicts  the  green  channel 
from  a  typical  image  from  this  screening  protocol  Note  that  the  brighter  bead  would  be  classified  as  a 
“hit.”  We  are  currently  working  to  sequence  these  hits  using  MALDI-MS. 

We  have  alternatively  begun  to  screen  our  libraries  with  cell  membrane  extracts  as  well  as  using 
competitive  binding  assays  between  cell  membrane  extracts  from  normal  and  cancerous  prostate  cell  lines. 
For  all  of  the  work  discussed  below,  we  have  used  RWPE- 1  cells  as  our  normal/healthy  cell  line  and  PCS  as 
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our  cancerous  cell  line.  Figure  3A  shows  a  normalized  binning 
chart  for  screening  our  FPL  with  rhodamine  labeled  membrane 
extracts  (red  diamonds)  or  fluorescein  labeled  membrane  extracts 
(green  diamonds).  Differentiation  between  what  would  be 
classified  as  “hits”  (indicated  within  the  blue  box)  and  “non¬ 
specific  background  binding  SLs  is  sufficient  to  obtain  acceptable 
hit  rates  under  10%. 

In  the  conpetitive  binding  screen,  one  sanple  is  labeled  with 
fluorescein  while  the  other  is  labeled  with  rhodamine.  In  our 
current  analysis,  our  sanples  are  the  cell  membrane  extracts  ifom 
PCS  and  RWPE-1.  Each  cell  membrane  extract  was  separately 
labeled  with  each  dye  to  produce  R-RWPE-1,  F-RWPE-1,  R-PC3, 
and  F-PC3;  where  R  =  rhodamine,  F  =  fluorescein.  A  portion  of 
the  FPL  was  then  incubated  separately  with  each  cell  membrane 
extract  listed  above  (ie.  alone)  and  with  all  possible  combinations 
in  a  1:1  w/w  ratio. 

Figure  3B  shows  individual  color  channels  Irom  images  taken 
of  a  portion  of  the  FPL  binding  to  a  mixture  of  F-RWPE-1  and  R- 
PC3  imaged  under  the  appropriate  filters  for  each  dye,  ie.  DSR  for 
rhodamine  (red  channel)  and  GFP3  for  fluorescein  (green  channel). 

Each  image  is  of  the  same  beads,  just  taken  using  a  different 
emission  filter.  Note  that  the  bead  indicated  by  the  yellow  arrow 
in  the  green  image  is  brighter  than  the  other  beads  relative  to  the 
brightness  of  this  same  bead  in  the  red  image.  This  indicates  that 
the  SL  attached  to  this  bead  binds  more  tightly  to  the  fluorescein 
labeled  analyte  than  to  the  rhodamine  labeled  analyte.  Figure  3C 
expresses  this  more  quantitatively,  showing  the  fold  increase  in 
brightness  for  the  six  brightest  beads  in  each  image  with  respect  to 
the  average  background  binding.  Notice  that  most  of  the 
intensities  are  close  to  one,  indicating  that  these  beads  are  in 

general  of  equal  brightness  and  close  to  the  average  bead  intensity.  However,  the  bead  labeled  “4”  displays 

nearly  a  2-fold  enhancement  in  binding  to  F-RWPE-1  compared  to  the  other  beads  binding  to  E-RWPE-1  as 
well  as  conpared  with  all  of  the  beads  binding  to  R-PC3,  and  corresponds  to  the  bead  indicated  by  the 
yellow  arrow. 

Based  upon  screening  our  FPL  with  individual  and  mixed  prostate  derived  cell  membrane  extracts,  five 
new  sequences  have  been  identified.  Table  1.  Most  significantly,  these  SLs  were  identified  Irom  screening 
our  library  with  known  prostate  associated  sanples.  Most  excitingly,  these  SLs  were  identified  ifom  an 
incredibly  heterogeneous  mix  of  membrane  supported  proteins  and  glycoproteins,  all  labeled  with  a 

fluorescent  dye.  Furthermore,  nearly  half  of 
these  sequences  came  Ifom  mixtures  of 
different  incredibly  heterogeneous  cell 
membrane  extracts,  and  still  some  degree  of 
selectivity  in  binding  was  achieved!  We  are 
currently  continuing  to  evaluate  the  selectivity 


Table  1.  Sequences  of  SLs  identified  against  prostate 
derived  cell  line  membrane  extracts. 

SL  Hit 

Sequence 

Cell  Line 
Screened 

Cell  Line 
Selectivity 

SL10 

H2N-RLD*ARSD*G-BBRM-resin 

F-PC3 

-- 

SL11 

H2N-RLD*YLTD*R-BBRM-resin 

F-RWPE-1 /R-PC3 

PC3 

SL12 

H2N-RLD*GFYD*Q-BBRM-resin 

F-RWPE-1 /R-PC3 

RWPE-1 

SL13 

H2N-RTD*GLAD*V-BBRM-resin 

F-RWPE-1 

-- 

SL14 

H„N-RYD*RASD*V-BBRM-resin 

R-PC3 

-- 

1  2  3  4  5  6 


Figure  3.  a)  Binning  chart  from 
screening  the  FPL  against 
rhodamine  labeled  (red  diamonds) 
or  fluorescein  labeled  (green 
diamonds)  membrane  extracts.  B) 
Red  and  Green  channel  images 
Ifom  screening  FPL  with  F-RWPE- 
1  and  R-PC3.  Arrows  indicate  one 
SL  that  binds  F-RWPE- 1  more  than 
R-PC3and  C)  quantifies  this 
preference  (bead  4). 


of  binding  for  these  new  SLS  as  well  as  assessing  their  utility  as  part  of  our  SL  Array. 
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Task  1  c):  Upon  identiP/ing  >5  hits,  we  will 

sequence,  resynthesize,  and  determine  their 

selectivitv  of  identified  hits  towards  the 

target  that  they  were  selected  against  as 

well  as  the  other  prostate  cancer  associated 

glycoproteins  and  complex  glycans. 

(Months  3-36) 

We  set  out  to  yalidate  our  two  PBA-peptide 
hits  by  first  res}?nthesizing  the  two  hits 
identified  in  (Task  lb),  sLe’^l  and  sLe’‘2. 

We  then  screened  these  hits  against  Le’‘, 

Le^*  and  sLe®  (Figure  4A)  and  determined 
that  both  of  the  hits  bind  sLe’'  better  than 
Le’^  or  either  of  the  Le^*  deriyatiyes  (Figure 
4C).  These  results  are  encouraging  and 
win  be  expanded  as  the  number  of  hits 
increases  after  additional  rounds  of 

screening. 

We  haye  been  able  to  sequence  SLs 
from  our  fixed-position  library  using 
traditional  Edman  degradation  techniques 
without  remoyal  of  the  boronic  acid 
moiety.  As  preyiously  discussed,  we 

accepted  the  low  success  rate  for 

sequencing  hits  using  MALDT-MS-MS 
(-40%),  and  looked  forward  to  using  the  Orbi-Trap  MS  where  we  were  able  to  obtain  enhanced  sensitivity 
and  seemingly  better  sequencing  efficiency.  Howeyer,  the  observed  increase  in  sensitiyity  often  hindered 
our  analysis  by  introducing  higher  background  signal  compared  to  MALDI-MS  and  thereby  complicated  the 
MS-MS  analysis.  Consequently,  when  proyided  the  opportunity  to  eyaluate  using  Edman  degradation 
methods  to  sequence  our  SLs  we  enthusiastically  tried  it.  The  most  significant  change  made  to  our  design 
was  that  we  could  no  longer  acylate  the  N-terminus  of  our  SLs,  because  to  do  so  would  end  the  possibility 
of  using  this  technique.  Thus  a  new  library  was  synthesized  using  the  split-and-pool  protocol  preyiously 
described.  The  primary  modification  from  prior  library  s}mtheses  was  that  instead  of  cleaying  the  Fmoc  and 
acylating  the  terminal  amine  after  coupling  the  final  R;  the  Dab(iyDde)  protecting  groups  were  remoyed 
using  hydrazine  and  the  boronic  acid  groups  were  introduced  yia  reductiye  amination  prior  to  remoying  the 
Fmoc  protecting  group.  The  new  general  sequence  for  this  fixed-position  SL  library  is  H2N-R-X-D*-X-X- 
X-D*-X-B-B-R-M-resin,  where  X  denotes  a  randomized  amino  acid  chosen  from  R,  A,  G,  V,  N,  Q,  L,  F,  S, 
Y,  T;  while  D*  indicates  diaminobutanoic  acid  with  a  2-methyl  phenyl  boronic  acid  attached. 

Perhaps  the  most  compelling  argument  for  switching  from  MS-based  sequencing  to  Edman-based 
analysis  is  that  there  is  no  requirement  to  remoye  the  boronic  acids  from  the  SL  prior  to  sequencing.  When 
using  MALDI-MS  we  found  that  oxidation  of  the  boronic  acid,  followed  by  cleayage  of  the  resulting  2- 
methylphenol  simplified  our  analyses.  Howeyer,  when  using  the  Orbi-Trap  MS  we  noticed  not  only 
remoyal  of  our  phenyl  boronic  acid  (PBA),  but  also  partial  and  irregular  cleayage  of  our  SL  backbone.  This 
again  only  seryed  to  complicate  our  analysis  when  using  data  from  the  Orbi-Trap.  In  our  first  efforts  using 
Edman  degradation  to  sequence  our  SLs  we  obtained  beautiful  data  for  the  SL  peptide  sequence  that  had 
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Figure  4.  New  screening  targets.  A.  Structures  of  cancer 
associated  glycans.  B.  Diagram  of  the  glycan  screening 
methods  using  FITC-Streptayidin  (SA).  C.  Library  screens 
of  the  different  glycans  from  A.  using  the  approach  in  B. 


never  been  coupled  with  the  boronic  acids  (as  would  be 
expected),  including  a  new  peak  in  the  corresponding  LC 
traces  associated  with  Dab.  However,  the  Edman-based 
sequencing  results  of  known  sequences  after  removal  of 
the  PBA  showed  the  presence  of  nun^rous  airrino  acids 
in  each  cycle,  indicating  that  the  SL  peptide  backbone 
had  been  partially  hydrolyzed  during  the  removal  of  the 
PBA.  Control  studies  confirmed  that  incoirplete 
coupling  while  synthesizing  the  SL  was  not  to  blame  for 
this  result.  Consequently,  we  decided  to  evaluate  this 
approach  without  removing  the  PBA  groups.  In  the  case 
of  the  fixed-position  library,  we  know  where  the  D* 
residues  are  and  since  we  onfy  need  to  know  the  identity 
of  the  five  randomized  amino  acids  before,  between  and 
after  these  building  blocks  the  Edman-based  approach 
should  work  as  long  as  the  boronic  acids  do  not  interfere  with  the  pheny&othiocyanate  chemistry  (Figure 
5).  Remarkably,  the  PBA  does  not  appear  to  interfere  with  the  analysis  and  in  fact  a  new  peak  is  observed 
in  the  LC  trace  that  is  consistent  with  the  D*  moiety  (Figure  5),  thereby  opening  the  door  for  the  use  of 
completely  randomized  libraries  with  the  ability  to  sequence  the  D*  residues.  Using  this  approach,  we  have 
identified  four  new  SLs  from  library  screens  using  prostate  cancer  associated  glycoproteins  and  cell 
membrane  extracts  (Table  1). 

Partnering  PI'. 

Task  1  a):  Synthesize  bead-based  peptide  libraries  that  incorporate  phenyfooronic  acid  moieties.  (Months  1-4) 

Two  peptide-based  fixed-position  libraries  were  S5^thesized  on  Tentagel  resin  analogous  to  those  previously 
described.^  The  eflectiveness  of  the  coupling  was  assessed  using  MALDI-MS  in  the  past,  here  however,  we 
ran  into  difficulties.  From  all  of  our  efforts,  our  MS  analysis  consistently  indicated  incomplete  deprotection 
of  the  iv-Dde  protecting  groups  on  the  Dab  side-chains  (where  boronic  acids  are  attached).  This  appeared  to 
be  a  significant  portion  of  the  product,  conposing  up  to  60%.  Moreover,  our  MS  analysis  frequently 
suggested  that  we  were  getting  incomplete  coupling  of  the  first  Dab  moiety.  These  were  problems  we  had 
not  encountered  previously,  yet  appeared  to  be  an  issue  when  even  re- synthesizing  known  SLs. 

Consequently,  we  thoroughly  evaluated  the  quality  of  the  batches  of  TentaGel  resin,  hydrazine  (used  to 
deprotect  the  iv-Dde)  and  Fmoc-Dab(iv-Dde)-OH  from  the  vendors.  Note  that  we  were  using  the  same 
vendors  as  we  had  in  the  past.  No  apparent  anomalies  were  detected  in  these  reagents.  Furthermore,  upon  a 
detailed  investigation  of  the  literature,  we  identified  much  “controversy”  and  similar  problems  were 
indicated  with  respect  to  deprotecting  the  iv-Dde  protecting  group. 

We  thus  opted  to  re-evaluate  our  synthetic  approach  and  tried  different  side- chain  amine  protecting 
groups  on  Dab  including  alloc  and  MTT.  From  these  studies,  we  determined  that  the  deprotection  of  alloc 
was  sensitive  to  water  and  oxygen,  making  it  difficult  to  work  with  at  times.  Furthermore,  while  the  MTT 
group  was  easy  to  deprotect,  amino  acids  with  this  group  on  the  side- chain  were  often  difficult  to  couple  to 
the  resin  due  to  the  size  of  the  MTT  group  and  increased  steric  interactions. 

Interestingly,  when  we  s}?nthesized  SLS  on  a  cleavable  Rink  Amide  Resin  using  Fmoc-Dab(iv-Dde)- 
OH,  we  were  able  to  confirm  the  presence  of  fiiHy  deprotected  SLS  as  the  major  product  using  MALDI-MS. 
Next,  we  more  rigorously  investigated  the  relative  ratios  of  protected  and  deprotected  SLS  from  the 
TentaGel  resin  using  LC-MS.  Remarkably,  using  this  method  we  observed  only  ~3%  of  the  mono-and  di- 


Edman  degradation  of  SL2.  The  first  4  resi¬ 
dues  from  the  N-terminus  are:  R-T-Dab-R. 
Note  the  new  peak  for  Dab  in  the  third  trace. 


10 


protected  analogs  combined.  StiE,  by  MALDI-MS  we  were  seeing  nearly  40%  of  the  protected  products 
from  the  same  sample.  After  numerous  control  experiments,  including  investigating  the  ionization 

efficiencies  for  all  of  the  possible  products  and  using  an  Orbi-Trap  MS-MS  to  confirm  sequences,  we  were 
able  to  confirm  the  validity  of  the  LC-MS  analysis. 

Ultimate^,  we  accepted  the  fickle-nature  of  MALDI-MS  and  again  felt  confident  in  our  synthetic 
protocols  for  library  development.  Confirmation  of  the  attachment  of  the  boronic  acids  proceeded  with  less 
uncertainty,  relying  on  a  previously  identified  binding  assay  with  alizarin  red  S  (ARS).  In  the  end,  we  were 
able  to  identify  other  orthogonal  amine  protecting  groups  (ie.  MTT  on  long  side-chain  amines)  that  will 
simplify  syntheses  related  to  studies  on  poly- valency  as  well  as  for  incorporating  other  side- chain 
functionality  such  as  biotin.  Using  the  Orbi-Trap  MS  we  were  also  able  to  obtain  better  sensitivity  and 
enhanced  sequencing  efficiency  as  compared  to  MALDI-MS. 

Task  1  b):  Screen  peptide  libraries  with  prostate  cancer  associated  glycoproteins  and  complex  glycans  to 
identify  highly  selective  and  cross-reactive  synthetic  lectin  (SL)  hits.  (Months  1-36) 

The  screening  methods  previously  used  to  identify  SL1-SL5  were  employed  to  screen  portions  of  our  library 
against  prostate  cancer  associated  glycoproteins.  As  we  continue  to  improve  these  screening  methods  we 
have  continued  to  improve  the  quality  of  the  hits  we  identify.  Initially,  we  screened  the  library  with 
ovalbumin  (OVA)  and  porcine  stomach  mucin  (PSM)  as  these  glycoproteins  contain  glycans  of  interest  that 
have  been  associated  with  prostate  cancer  (PCa),  namely  mannose  and  N-acetyl  glucosamine  (GlcNAc)  on 
OVA  and  GlcNAc  and  fiicose  on  PSM.  From  these  screens,  four  new  SLs  were  isolated  and  sequenced 
(SL6-SL9  in  Table  2). 

Beyond  simply  identifying  new  SLs,  we 
have  learned  a  great  deal  about  how  we  do  our 
analysis,  specifically  in  how  we  image  our 
resin  and  extract  color  data.  In  aH  of  our 
image  acquisition  and  analysis  we  have  been 
conscientious  of  the  quality  of  the  image  and 
how  we  extract  luminosity  data.  StiD,  until 

recently  all  decisions  had  been  made  by  the 
user,  which  can  introduce  user  bias.  Therefore, 
in  order  to  limit  the  introduction  of  external 
bias  we  wrote  a  bead  finding  and  data 
extraction  algorithm  using  MATLAB.  Of 
particular  interest  to  us  was  eliminating  any  inhomogeneity  across  the  field  of  view,  which  could  result  from 
variation,  between  users,  in  the  illumination  source  settings,  focus  or  hardware  alignment.  The  simplest 
approach  was  to  define  a  region  of  interest  (ROI)  that  could  be  set  and  used  to  reduce  any  edge  eflects. 
From  there  we  could  simpfy  have  the  software  “find”  the  beads  based  on  relative  intensity  changes.  In 
addition,  we  created  the  option  to  reject  any  identified  objects  based  on  size  (area  or  circumference), 
circularity  and/or  pixel  saturation  at  any  given  percentile  of  the  pixels  for  each  bead.  Remarkably, 
reprocessing  existing  images  with  this  algorithm,  using  only  the  ROI  and  rejection  based  on  size,  improved 
classification  accuracy,  based  on  leave-one-out  methods,  from  97%  to  99%  for  5  cell  lines. 

We  have  continued  to  optimize  our  data  acquisition,  extraction  and  analysis  protocols.  In  particular,  an 
integral  change  was  made  to  our  MATLAB  algorithm  in  order  to  inprove  the  identification  and 
quantification  of  individual  assay  beads.  One  challenge  we  continually  face  is  how  to  extract  data  from  dark 
images  resulting  from  weak  binding  between  an  SL  and  a  certain  analyte,  while  still  maintaining  confidence 


Table  2. 

Sequences  of  identified  SLs. 

SLHit 

Sequence 

Glycoprotein 

Screened 

Glycoprotein 

Selectivity 

SLl 

Ac-RGD*VTFD*R-BBRM-resin 

OVA 

Cross  reactive 

SL2 

Ac-RTD*RFLD*V-BBRM-resin 

OVA 

OVA 

SLS 

Ac-RSD*\/TTD*R-BBRM-resin 

OVA 

OVA 

SL4 

Ac-RRD*TQTD*Q-BBRM-resin 

PSM 

OVA,  PSM 

SLS 

Ac-RAD*TRVD*V-BBRM-resin 

PSM 

PSM 

SL6 

Ac-RTD*NRND*F-BBRM-resin 

PSM 

OVA,  BSM 

SL7 

Ac-RSD*YFTD*Q-BBRM-resin 

PSM 

OVA,  PSM 

SLS 

Ac-RTD*YGND*N-BBRM-resin 

PSM 

PSM 

SL9 

Ac-RTD*YQVD*A-BBRM-resin 

PSM 

OVA,  PSM 
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in  conparing  these  results  with  those  from  other  SLs  that  bind  more  analyte  and  as  a  result  are  much 
brighter.  At  the  heart  of  this  challenge  is  how  to  accurately  find  the  edge  of  the  dark  bead  compared  to  the 
background.  Given  that  we  typically  carried  out  our  analysis  based  on  luminosity  or  brightness 
measurement  we  always  found  particles  based  on  a  fold- change  over  background  using  a  greyscale  image 
that  resulted  from  merging  the  red,  green  and  blue  channels  from  our  color  camera.  While  the  fold-change 
value  can  be  readily  changed  to  reduce  the  threshold,  this  often  resulted  in  blurry  edges  and  increased 
variability  in  our  measurements.  In  the  new  MATLAB  algorithm  we  have  chosen  to  find  the  particles,  ie. 
identify  the  edges,  using  the  color  channel  with  the  greatest  amount  of  information,  for  exairple  using  the 
green  channel  for  fluorescein  and  the  red  channel  for  rhodamine.  Using  this  new  design,  we  are  able  to 
reliably  and  consistently  identify  beads  with  intensities  around  5  on  an  8-bit  scale,  whereas  the  previous 
protocol  limited  us  finding  beads  with  intensities  closer  to  15  on  an  8-bit  scale. 

Task  1  c):  Upon  identifying  >5  hits,  we  will  sequence,  resynthesize,  and  determine  the  selectivitv  of  identified 

hits  towards  the  target  that  they  were  selected  against  as  well  as  the  other  prostate  cancer  associated 

glycoproteins  and  complex  glvcans.  (Months  3-36) 

As  described  above,  the  four  new  hits  listed  in  Table  1  were  sequenced  using  MS-MS  techniques  and  were 
resynthesized  on  TentaGel  resin.  To  identify  general  selectivity  trends,  and  for  comparison  with  the  original 
five  SLs  identified,  each  SL  was  bound  with  three  glycoproteins  (OVA,  BSM,  and  PSM)  as  well  as  BSA, 
which  was  used  as  the  control  for  nonspecific  protein  binding  to  the  beads.  Briefly,  the  library  and  the  SLs 
were  blocked  with  1%  BSA  to  minimize  nonspecific  binding,  and  then  incubated  with  0.1  mg/mL  FITC- 
labeled  analytes  for  16  hours.  After  washing  with  PBS  to  remove  unbound  analyte,  beads  were  imaged 
using  a  fluorescent  microscope  and  color  data  extracted  using  the  MATLAB  algorithm  described  above. 
The  library  was  used  as  a  control,  to  reduce  the  differences  between  each  glycoprotein  in  the  extent  of 
fluorescent  labeling  and  degree  of  glycosylation.  As  such,  the  average  raw  intensity  values  for  the  library 
was  subtracted  from  each  replicate  measure 
for  each  SL  binding  analyte.  This 
normalized  difference  was  then  divided  by 
the  raw  intensity  of  the  library  to  afford  a 
relative  percent  change  for  each  SL  binding 
each  analyte.  As  shown  in  Figure  6,  all  of 
the  SLs  are  cross- reactive  to  some  degree. 

For  example,  while  SLl  is  considered 
completely  cross-reactive,  showing  virtually 
no  selectivity  for  any  particular  anafyte, 

SLS  and  SL6  display  exquisite  selectivity 
for  PSM  over  BSM  (~50-fold)  and  BSM 
over  PSM  (~60-fold),  respectively.  The 
remaining  newly  identified  SLs  show 
between  1.6  and  18-fold  selectivity  for  one 
analyte  over  another. 

While  continuing  to  evaluate  these  hits,  there  is  one  theme  that  has  become  obvious.  Most  notably,  the 
cross-reactive  SLs  that  exhibit  modest  selectivity  in  this  chicken- cow- pig  paradigm  (CCP,  derived  from 
ovalbumin  (chicken),  bovine  mucin  (cow)  and  porcine  mucin  (pig))  as  indicated  in  Figure  3,  typically 
provide  the  most  useful  information  when  assaying  cancer  related  samples.  In  this  same  manner,  generally 
speaking,  the  high  selectivity  that  one  can  achieve  for  cow  over  pig  mucin  (e.g.  SL5,  50-fold  selectivity) 


SL1  SL2  SLS  SL4  SLS  SL6  SL7  SLS  SL9 

Figure  6.  SL  selectivity  trends.  Relative  percent  change  in 
luminosity  for  SL1-SL9  binding  ovatoumin  (OVA), 
bovine  submaxiDary  mucin  (BSM),  porcine  stomach 
mucin  (PSM)  and  bovine  serum  albumin  (BSA). 
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does  not  translate  into  effective  discriminatory  capabilities  within  our  SL  Array.  For  example,  when  using  a 
SL  Array  composed  of  SLl-9  to  evaluate  the  metastatic  potential  of  six  prostate  derived  cell  lines 
(including:  RWPE-1  (Healthy);  WPE1-NA22  and  WPE1-NB14  (cancerous  non-metastatic);  LNCAP, 
DU145  and  PC- 3  (cancerous  metastatic),  achieved  with  100%  accuracy)  we  see  that  66%  of  the  variance,  or 
discriminatory  ability  of  the  array,  is  accounted  for  from  SL2  and  SL3,  25%  and  41%,  respectively.  Recall 
that  we  previously  excluded  SL2  because  of  the  similarities  in  response  to  purified  gfycoproteins  with  SL3 
as  well  as  noting  the  high  BSA,  background  binding  in  SL2.  Ultimately,  the  take-home  lesson  for  us  has 
been  1)  that  we  cannot  take  any  SL  for  granted,  and  2)  identifying  SLs  from  more  biologically  relevant 
samples  could  provide  better  classification  and  more  detailed  information  regarding  the  particular 
glycosylation  patterns  associated  with  a  particular  disease  state. 

Task  2.  Initiating  PI:  Examine  the  biochemical/biophvsical  basis  of  the  glycarnSL  interactioa  (Months  3-36) 

Task  2  a):  Upon  identifying  >5  hits  (Task  1),  we  will  develop  a  structure- activitv  relationship  for  highly 

selective  SLs  based  on:  1)  Alanine  scanning  ‘mutagenesis’;  2)  Varying  the  tether  length;  3)  Varying  the 

boronic  acid  linkage  and  substitution  patterns;  and  4)  Examining  boronic  acid  substituent  effects,  to  identify 

the  factors  that  promote  the  selective  recognition  of  a  glvcan  by  a  particular  SL.  (Months  3-32) 

While  we  have  had  previous  success  using  2- 
phenytooronic  acid  as  our  glycan  targeting  moiety,  we 
also  wanted  to  see  if  the  recently  described 

benzoboroxole  would  serve  as  a  more  suitable  boronic 
acid.  We  first  synthesized  the  carboxy-benzoboroxole 
(Figure  7 A)  and  then  coupled  it  to  the  same  side- chain 
Dab  amine  on  SL5  as  was  used  for  the  PBA  derivative. 

Interestingly,  benzoboroxole- SLS  showed  increased 
aflOnity  for  PSM  when  conpared  to  the  original  PBA 
derivative  (Figure  7B).  Due  to  the  inproved  aflOnity,  we 
buflt  both  a  peptide  library  (diversity  =  11^;  1.6  xlO^ 
members)  as  well  as  a  peptoid  library  (diversity  =  9^;  5.9 
xlO"^  members)  incorporating  the  benzoboroxole  moiety. 

While  we  were  able  to  successfully  screen  the  peptide 
library  and  identify  a  hit  (“Boxl”  -  MRBB- 

VDARTDGR),  sequencing  the  boroxole  hits  has  been 
challenging  due  to  the  effect  of  the  benzoboroxole 
moiety  on  ionization.  As  such,  we  are  optimizing  a 

variety  of  oxidation  and  cross  coupling  steps  that  we  expect  will  efficiently  remove  the  benzoboroxole 
functionality,  and  thereby  facilitate  the  successful  sequence  of  hits.  Additional  stmcture- activity 
relationships  will  be  determined  once  we  accumulate  >5  hits. 

Task  2  b):  Examine  the  contribution  of  multivalency  towards  binding  affinity/selectivitv  of  particular  SLs. 
Synthesize  mono-,  di-,  and  tri- topic  versions  of  the  SLs  identified  in  AIM  1  and  evaluate  their  importance 
for  glvcan  binding  and  selectivity.  (Months  18-36). 

To  examine  the  effects  of  multivalency  on  SL  affinity  and  selectivity  we  have  begun  to  synthesize 
monovalent,  two  trivalent  analogs  varying  the  distance  between  the  SL  chains,  and  polyvalent  SL5  based  on 
polyfacrylic  acid)  (Figure  8).  In  our  initial  efforts,  the  mono  and  trivalent  SL5  analogs  are  being 
synthesized  on  Rink- Amide  resin  using  HBTU/Fmoc  chemistry.  The  first  amino  acid  attached  to  the  resin 
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was  FinDC-Cys(Mint)-OH  because  the 
Mmt  group  can  be  removed  orthogonafly 
to  other  protecting  groups  used  in  the 
synthesis  to  install  a  reporter  dye  using 
thiol- selective  maleimide  chemistry 
(recall  Cys  is  not  used  in  the  library 
s}?nthesis).  For  the  monovalent  SL5, 

FmDC-Lys(ivDde)-OH  was  next  coupled. 

The  N- terminus  was  acylated,  the  side 
chain  of  lysine  was  deprotected  with 
hydrazine  and  the  desired  SL 
s}?nthesized  using  standard  methods. 

Synthesis  of  the  trivalent  SL5  analogs 
begins  like  that  of  the  monomeric  derivative,  however  instead  of  acylating  the  N-terminus,  the  chain  is 
extended  with  either  one  glycine  or  3  glycine  residues  as  spacers  between  the  lysine  branches  that  contain 
SL5  (Figure  8).  Differing  amounts  of  glycine  can  be  incorporated  between  the  Lys  units  to  explore  the 
effect  of  SL  density  on  glycan  binding.  Sequential  addition  of  Lys  and  glycine  spacers  provides  the  desired 
tri-topic  scaffolds.  Subsequently,  the  lysine  side-chains  are  deprotected  with  hydrazine  and  the  desired  SL 
synthesized  in  triplicate  using  standard  methods  in  paralleL  After  the  Dab  side- chains  are  deprotected  using 
hydrazine  and  the  boronic  acids  are  coupled  via  reductive  amination,  the  Mmt  protecting  group  on  Cys  is 
removed  using  1%  TFA  and  the  free  thiol  attached  to  a  maleimide  containing  reporter  tag  (there  is  no 
interference  with  the  boronic  acid).  To  obtain  additional  information  related  to  the  intensity  of  the 

fluorescence  signal  upon  binding,  as  well  as  to  probe  the  stmcture- activity  relationship  related  to 
functionalization  of  the  SL  termini,  we  have  also  attached  fluorescent  dyes  to  the  terminal  amine  of  the  Gly- 
Lys  scaffold  as  well  as  to  the  N-tenrrinus  of  each  SL.  Finally,  the  SL  analogs  are  cleaved  from  the  resin 
with  concurrent  removal  of  the  acid- labile  side- chain  protecting  groups  using  95%  TFA. 

With  these  synthetic  steps  conpleted  we  are  currently  working  to  purify  and  validate  the  stmcture  of 
these  SLs.  Purification  of  the  trivalent  SL5  has  proven  to  be  a  non-trivial  task  and  we  are  having  difficulty 
fully  characterizing  these  novel  stmctures.  Consequently,  we  are  proceeding  to  a  higher  order,  more 
broadly  defined  polyvalent  SLS  derived  from  commercially  available  poly(acrylic  acid)  (PAA). 
Modification  of  PAA  begins  with  using  EDC/HOBT  to  couple  each  acid  side-chain  with  1,4-diaminobutane. 
The  resulting  amine  functionalized  polymer  is  then  partially  derivatized  with  maleic  anhydride  to  afford  the 
maleimide  modified  PAA  which  can  be  coupled  with  our  Cys-terminated  monovalent  SLS.  Any  remaining 
primary  amines  can  be  left  to  afford  an  overall  positively  charged  polymer,  acylated  to  provide  a  neutral 
polymer,  reacted  with  succinic  anhydride  to  obtain  the  anionic  polymer  or  partially  modified  using  any  of 
these  methods  to  tune  the  charge  on  the  polymer  backbone.  We  are  just  now  beginning  this  synthesis  and 
are  excited  about  the  opportunities  available  via  this  approach  to  vary  in  a  controllable  manner,  the  SL 
density  and  overall  ensemble  charge.  Affinity  and  selectivity  of  each  SL  analog  will  be  studied  using  a 
Fluorescence  Polarization  (FP)  assay  established  in  the  Pis’  labs  and/or  a  microtiter  plate-based  approach 
relying  on  immobilization  of  the  glycoprotein  followed  by  “staining”  with  the  labeled  SL  analog. 

Task  2  c):  Feed  information  from  the  above  studies  back  into  the  library  design  process  to  aid  the  generation 

and  subsequent  identification  of  highly  selective  SLs.  (Months  9-32). 

Based  on  our  experience  with  the  benzoboroxole,  which  improved  the  affinity  of  SLS  for  PSM,  we  are 
focused  on  incorporating  this  moiety  into  libraries  once  we  optimize  library  sequencing.  The  lessons 
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learned  from  the  Partnering  Pi’s  structure- activity- relationships  are  also  being  incorporated  into  the  design 
process  (see  below). 

Task  3.  Partnering  PI:  Examine  the  biochemical/biophvsical  basis  of  the  glycamSL  interaction  and  develop 

SL-based  sensor  arrays  for  the  proposed  prostate  cancer  diagnostic.  (Months  1-36) 

Task  3  a):  Develop  a  stmcture- activity  relationship  for  previously  identified  SLs  (SL2  and  SL5)  based  on:  1) 

Alanine  scanning  ‘mutagenesis’;  2)  Varying  the  tether  length;  3)  Varying  the  boronic  acid  linkage  and 

substitution  patterns;  and  4)  Examining  boronic  acid  substituent  effects  to  identify  the  factors  that  promote 

the  selective  recognition  of  a  glycan  by  a  particular  SL.  (Months  1-12) 

In  our  analysis  of  how  structure  impacts  binding  afiinity  and 
selectivity  of  SLs  for  glycoproteins,  we  have  identified  some 
expected  and  some  unexpected  correlations.  These  studies  have 
largely  revolved  around  SL2  and  SLS  because  they  represent 
opposite  ends  of  the  spectrum;  in  that  SL2  displayed  modest 
selectivity  (~2-fold)  with  high  background  binding  while  SLS 
exhibited  high,  nearly  SO-fold  selectivity,  with  low  non-specific 
binding.  In  selecting  these  two  SLs  we  wanted  to  learn  more 
about  what  factors  most  significantly  impact  binding  for  highfy 
selective  and  modestly  selective  SLs  to  better  understand  if  the 
same  factors  are  important  for  each.  In  the  end,  we  are  focused  on 
improving  our  approach  towards  generating  new  SLs  capable  of 
eflectivefy  discriminating  between  healthy  and  cancerous 
samples. 

Using  alanine  scanning  mutagenesis  with  SL2  for  binding 
OVA  (Task  3  a-1.  Figure  9 A)  we  see  that  charge  on  the  peptide  is 
important  for  binding  aflOnity.  Specifically,  replacing  R4  with 
alanine  causes  a  60%  decrease  in  binding  conpared  to  native- SL2. 

Similarly,  R1  and  the  arginine  found  in  the  C-terminal  MRBB- 
sequence  also  reduce  binding,  though  to  a  lesser  extent  (45%  and 
24%  respectively).  Likewise,  binding  afiinity  is  reduced  by  more 
than  50%  when  the  aminomethyl- phenyl  boronic  acids  (D*  =  3,7- 
Dab- 2- PB  A)  are  replaced  with  alanine  or  phenylalanine. 

However,  when  the  Dab  residues  were  left  unmodified  or 
alkylated  with  benzaldehyde,  thereby  leaving  the  charged  ammonium  at  neutral  pH,  binding  afiinity  was 
only  diminished  2-3%.  Similar  trends  were  observed  in  SL5  for  binding  with  PSM  (Figure  9B).  For 
example,  when  R5  was  replaced  by  alanine,  binding  was  decreased  nearly  55%  and  replacing  both  D*  with 
alanine  resulted  in  a  65%  binding  decrease.  Interestingly,  when  T4  was  replaced  by  alanine  PSM  binding 
was  enhanced  25%.  Similar^,  when  V6  or  V8  was  replaced  with  alanine  a  20%  and  5%  increase  in  PSM 
binding  was  observed,  respectively. 

The  role  that  the  boronic  acids  play  in  defining  SL  binding  afiinity  and  selectivity  was  also  studied 
(Task  3  a-3).  In  general,  there  was  no  observed  loss  of  afiinity  when  regio-isomeric  phenyl  boronic  acids 
(PBAs)  were  used  in  SL2  and/or  SL5,  yet  the  PBA  is  undoubtedly  important  for  defining  selectivity  (Figure 
10).  As  seen  in  Figure  lOA,  there  is  no  appreciable  change  in  the  selectivity  patterns  whether  the  boronic 
acid  is  ortho-,  met  a-  or  para-  to  the  linkage  to  the  peptide.  This  observation  was  unexpected  1)  because  of 
expected  conformational  preferences  for  sugar  binding  based  on  positioning  the  boronic  acid  in  a  specific 
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orientation  to  bind  the  sugar,  and  2)  because  when  the 
boronic  acid  is  ortho-  to  the  amino-methyl  group  enhanced 
diol  binding  is  expected  due  to  conformational  and  Lewis 
acidity  trends.  Still,  binding  between  saccharides  and  meta- 
linked  boronic  acids  has  been  observed  particularly  when 
involved  in  a  pofyvalent  system,  thereby  providing  support 
for  this  observation.  When  the  more  sterically  crowded  and 
conformationaUy  restricted  2-Ac-PBA  is  incorporated  into 
SL2  the  binding  preference  for  OVA  actually  increases, 
though  modestly  (from  3-fold  to  ~6-fold).  Most  notably, 
however,  is  that  when  the  PBA  is  replaced  with  a  simple 
benzyl-group  aU  selectivity  is  lost.  SL5  showed  similar 
trends  (Figure  lOB);  with  the  orientation  of  the  boronic  acid 
having  no  significant  influence  on  glycoprotein  binding. 

Interestingly,  in  contrast  to  what  was  observed  for  SL2,  the 
binding  selectivity  for  SL5  decreased  when  the  bulky  2-Ac- 
PBA  was  used,  perhaps  providing  some  insight  into  the  steric 
and/or  hydrophobic  nature  of  the  bound  sugar  environment. 

The  final  boronic  acid  modification,  adding  electron- 
donating  (-OCH3,  -NR2)  and  electron- withdrawing  (-CF3,  - 
NO2,  -CN)  substituents  onto  the  PBA  to  alter  the  Lewis  acidity  of  the  boronic  acid  {Task  3  a-4), 
unquestionably  showed  no  inpact  on  analyte  binding. 

The  length  of  the  side-chain  connecting  the  PBA  to  the  peptide  (ie.,  the  tether  length.  Task  3  a-2)  was 
also  investigated.  For  this  analysis.  Dab  and  Lys  were  incorporated  as  the  amino  acid  to  which  the  boronic 
acid  was  attached  in  order  to  probe  how  degrees  of  freedom  and  thus  preorganization  can  inpact  binding 
selectivity.  Figure  llA  and  B  show  representative  fluorescence  images  of  portions  of  two  libraries,  derived 
independently  from  attachment  of  PBA  to  either  DAB  or  LYS,  after  incubation  with  FITC-OVA.  The  Dab- 
based  library  displays  decreased  non- selective  binding,  as 
indicated  by  the  decreased  background  fluorescence  and 
increased  library  diflerentiation.  Figure  1 1C  is  a  binning 
chart,  in  which  individual  bead  luminosities  are  plotted  for 
each  library.  The  greater  spread  in  the  data  obtained  for 
the  Dab- containing  library,  versus  the  otherwise  identical 
LYS -containing  library,  is  an  indication  of  greater 
diflerentiation  and  selectivity  for  binding  the  targeted 
glycoprotein. 

As  a  final  investigation  of  how  stmcture  can  inpact 
binding  between  SL  and  glycan,  we  looked  at  what  inpact 
the  fluorescent  label  could  have.  SL1-SL5  are  cationic, 
each  containing  a  rninimum  of  three  arginine  residues,  and 
fluorescein  is  anionic  at  physiological  pH.  Based  on  what 
we  learned  about  how  charge  inpacts  aflinity  in  our 
alanine  scanning  mutagenesis  studies,  we  wanted  to 
determine  how  the  dye  charge  was  inpacting  binding  aflinity.  We  therefore  labeled  each  of  our 

glycoproteins  with  coumarin  (as  a  neutral  akemative)  and  rhodamine  (as  a  cationic  alternative)  separately. 


Dab  Lys 

Figure  11.  Representative  fluorescence 
images  of  libraries  incorporating  Dab  (A.) 
or  Lys  (B.)  upon  binding  FITC-OVA.  C.  A 
binning  chart  of  individual  bead  lumin¬ 
osities  showing  decreased  background 
fluorescence  and  increased  diflerentiation 
when  Dab  is  used  compared  to  Lvs. 


Figure  10.  Inpact  on  binding  selectivity 
between  SL  and  different  test 
glycoproteins  (OVS,  BSM,  PSM,  BSA) 
when  different  regio- isomeric  boronic 
acids  are  used  for  SL2  (A.)  and  SL5  (B.). 
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If  the  charge  on  the  dye  significantly  inpacts  the  aifinity  of  the 
SL  for  any  given  glycoprotein,  we  should  see  a  decrease  in  the 
binding  response  as  we  move  firom  fluorescein  to  coumarin, 
which  is  in  fact  what  we  observe  (Figure  12).  Still,  rhodamine 
labeled  glycoproteins  would  be  expected  to  have  a  further 
reduced  binding  affinity  due  to  the  cationic  dye,  which  is 
contrary  to  our  results.  We  conclude  from  this  that  our 
microscope  filter  set  is  somehow  inappropriate  for  the 
coumarin  dye  we  are  using,  even  though  the  wavelengths 
described  seem  relevant.  Regardless,  we  are  much  more 
confident  that  labeling  our  targets  is  an  appropriate  method  for 
identifying  hits  diagnostic. 

Efforts  to  evaluate  and  understand  the  SL-glycan  binding 
interaction  have  continued  into  the  subsequent  years  of  funding  while  previous  results  have  been  fed  back 
into  our  SL  design.  For  example,  based  on  the  remarkable  correlation  between  SL  charge  and  binding 
aifinity,  we  have  made  certain  to  include  at  least  one  arginine  residue  near  each  terminus  of  our  SLs,  while 
also  taking  a  closer  look  at  the  importance  of  arginine  residues  near  the  middle  of  our  SLs.  It  is  clear  that 
upon  removal  of  any  of  the  positive^  charged  arginine  residues  from  the  basic  SLS  sequence  binding 
afiOnity  is  reduced  (Figure  13).  In  initial  studies  (discussed  above),  the  R5A  mutant  of  SLS  showed  nearly  a 
SS%  decrease  in  binding  to  PSM  compared  to  the  native  SLS.  In  the  present  studies,  we  observe  nearfy  a 
40%  decrease  for  the  same  mutant.  Inportantly,  the  trend  remains  the  same,  the  small  difference  in  these 
observed  changes  likely  results  from  a  change  in  glycoprotein  concentration  (O.S  mg/mL  (old)  to  0.1  mg/mL 
(new)).  This  reduction  in  analyte  concentration  was 
made  to  help  reduce  our  hit  rate  so  that  we  can  focus  on 
tighter  binding  SLs  while  also  hewing  to  reduce  cost 
when  using  clinically  relevant  glycoproteins. 

In  further  evaluating  binding  selectivity,  alanine 
scanning  mutagenesis  was  carried  out  to  study  SL2  and 
the  mutants  binding  to  proof- of- concept  gfycoproteins 
(BSA,  BSM,  OVA,  PSM).  Recall  that  previous  studies 
only  examined  how  these  mutants  bound  to  OVA.  As 
indicated  in  Figure  14 A,  removal  of  any  of  the  charged 
arginine  residues  (SL2-no  MRBB,  SL2-R4A,  SL2-R1A) 
results  in  a  loss  of  binding,  though  the  relative  binding 
pattern  for  the  four  glycoproteins  remains  virtually 
unchanged,  with  the  exception  of  SL2-R4A  where  the 
BSM/PSM  selectivity  inverts.  Similarly,  removing  the 
Dab  and  the  PBA  cause  a  decrease  in  binding  affinity  for 
all  glycoproteins  studied.  Interestingly,  for  the  SL2-D*3,7F  mutant  the  binding  selectivity  is  changed  such 
that  this  SL  prefers  binding  to  FFTC-BSA,  even  in  the  presence  of  1%  (w/w)  BSA.  Finalfy,  upon 
reintroduction  of  the  Dab  residue  as  either  the  primary  (SL2-Dab)  or  secondary  amines  (SL2-Bn),  while  still 
lacking  the  boronic  acids,  the  overall  affinity  is  recovered  but  the  selectivity  is  dramatically  reduced, 
indicating  the  importance  of  the  charge  on  SL2  mutants  in  binding  to  (-1-5  in  each  of  these  mutants)  but  not 
necessarily  in  discrimination  of  these  different  glycoproteins. 


Figure  13.  SLS  mutant  selectivity  trends 
based  on  fraction  bound  compared  with  SLS 
binding  ovalbumin  (OVA),  bovine  submax- 
iDary  mucin  (BSM),  porcine  stomach  mucin 
(PSM)  and  bovine  serum  albumin  (BSA). 


SL4 


SLS 


SL2  SLS 
Figure  12.  Relative  binding  response 
for  SL2-SL5  binding  with  PSM, 
labeled  with  coumarin  (blue), 
fluorescein  (green)  or  rhodamine  (red). 
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Furthermore,  we  had  previously  shown  that  by 
shortening  the  amino  acid  side-chain,  onto  which  the  PBA 
is  attached,  from  four  methylenes  (lysine)  to  two 
methylenes  (Dab),  we  can  reduce  non-specific  background 
binding  to  the  boronic  acids  and  thereby  increase  selectivity 
by  taking  advantage  of  pre- organization  (Figure  11).  We 
took  this  analysis  one  step  further,  moving  from  two  to  one 
methylene  spacer  between  the  peptide  backbone  and  the 
PBA  attachment  point  (Dab  to  Dpr  (diarrrinopropanoic 
acid),  respectively).  Simultaneously,  we  evaluated  the 
significance  of  the  boronic  acids  on  these  SL5  and  Dpr 
mutants.  The  first  pattern,  labeled  “SL5  raw,”  in  Figure 
14B  simply  shows  the  normalized  response  (by  the  greatest 
intensity)  for  SL5  responding  to  our  four  proof- of- concept 
glycoproteins.  Note  that  even  in  this  raw  form,  SL5  stiH 
binds  most  significantly  with  PSM.  The  second  pattern,  for 
SL5,  shows  the  simple  normalization  (to  one)  of  the 
response  of  SL5  binding  to  each  glycoprotein  (as  a 
reference  for  comparison).  Note  that  when  the  boronic 
acids  are  not  included,  as  depicted  for  SL5-Dab,  the  afiOnity 
(indicated  by  reduced  bar  size)  and  selectivity  (indicated  by 
the  pattern  being  the  inverse  of  that  for  SL5  raw)  displayed 
by  SL5  is  lost,  providing  additional  evidence  for  the 
inportance  of  the  boronic  acids  in  our  approach.  While  we 
previously  saw  that  SL2-Dab  maintained  high  alfinity  for 
OVA  even  without  the  boronic  acids  (Figure  11),  SL5-Dab 
does  not  follow  this  trend,  even  though  both  SL  mutants  are  overall  5-i-  charged.  This  is  not  a  surprising 
result  because  we  expect  that  the  boronic  acids  and  the  peptide  sequence  is  more  significant  in  defining  the 
SL5  binding  interactions  than  they  are  for  SL2  based  on  the  higher  selectivity  exhibited  by  SL5  conpared  to 
that  of  SL2. 

We  are  also  continuing  to  examine  our  anafysis  protocols  that  define  the  relative  response  of  each  SL  for 
a  series  of  different  analytes.  In  the  case  of  studying  purified  glycoproteins,  we  previously  used  the  average 
response  from  20  library  beads  as  a  reference.  While  this  does  provide  a  control  set  containing  all  of  the 
potential  cross-reactive  elements  that  could  interfere  with  our  assessment  of  binding  selectivity,  it  is  also 
susceptible  to  large  variations  depending  on  sample  size.  If  we  were  to  use  our  entire  library,  this  would  be 
an  ideal  reference,  however,  new  “reference  libraries”  would  need  to  be  synthesized  and  evaluated  for  each 
glycoprotein,  each  time  new  samples  were  made  (to  account  for  labeling  variation)  and  this  is  not 
reasonable.  As  this  method  has  been  used,  ie.  with  small  sample  sets  (n=20),  the  inclusion  of  one  “hit” 
within  the  library  “reference”  data  is  sufficient  to  vary  the  average  between  10-30%  based  on  typical 
luminosity  values  for  identifying  a  hit  (e.g.  assume  average  library  background  luminosity  ~  30  for  n  =  20; 
including  one  “modest  hit”  (luminosity  -100)  changes  this  average  to  -33  and  including  one  good  hit 
(luminosity  -200)  changes  the  average  to  -38;  if  n  =  15  the  range  changes  to  15-40%  variability). 


■  OVA  BBSM  BPSM  ■ 

BSA 

A 

1  .  .1 

i 

SL2  SL2-(no  SL2-  SL2-  SL2-  SL2-  SL2-  SL2- 


MRBB)  (R4A)  (R1A)  (D*3,7A)(D*3,7F)  (Dab)  Bn 

B  1.2 


SL5  raw  SL5  SL5-Dab  SL5-Dpr*  SL5-Dpr 


Figure  14.  A)  Binding  selectivity  for  SL2 
and  SL2-mutants,  and  B)  raw  binding 
intensities  for  SL5  and  SL5  mutants  all 
responding  to  OVA,  BSM,  PSM  and  BSA. 
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As  a  consequence  of  this  variability,  we  have  evaluated  other  means  of  accounting  for  instrument,  user 
and  labeling  variability.  As  a  first  response,  regular  examination  of  the  optical  set-up  using  NIST-certified 
control  particles  (commercially  available)  ensures  the  consistency  of  our  hardware  set-up.  We  have  also 
evaluated  a  number  of  “SL  control  sequences”  to  be  used  as  reference  controls,  including:  acylated  resin, 
MRBB,  octa-ak  and  SL5  (Figure  15).  The  acylated  resin  does  not  sufficiently  bind  with  anything  and 
consequently,  when  we  image  the  beads  after  incubation  with  tagged  analyte,  we  most  often  cannot  even 
find  the  particles  to  measure.  The  MRBB  and  octa-ala  sequences  coat  the  resin  particles  with  a  peptide- 
based  structure  while  offering  little  to  no  structure  to  afford  discrimination  of  analytes  beyond  the  inherent 
stickiness  of  the  sample.  Due  to  the  similarity  in  the  response  from  each  of  these  models,  we  have  focused 
primarily  on  the  MRBB  sequence.  While  the  “look”  of  the  SL  binding  pattern  for  each  SL  changes,  the 
general  trends  are  maintained.  SL5  is  an  interesting  option  that  we  have  used  extensively,  particularly  as 
indicated  in  any  of  the  figures  in  this  document  with  a  Y-axis  label  reading  “Fraction  Bound”.  Using  an 
existing  SL  as  a  reference  offers  many  benefits,  perhaps  most  importantly  is  that  it  provides  a  known 
response  pattern  to  standard  glycoproteins  that  can  be  statistically  evaluated  to  assess  the  differences  and,  by 
association,  the  similarities  between  runs  (e.g.  using  MANOVA),  subsequently  producing  a  correction 
factor  if  needed.  As  a  minor  downside  to  this  approach,  the  analytes  which  bind  best  to  the  SL,  by 
definition,  return  the  smallest  response  from  the  other  SL  Array  members  due  to  the  mathematical  approach 
(ie.  dividing  by  a  large  number).  Nonetheless,  we  are  continuing  to  evaluate  these  approaches  and 
regardless  of  how  we  manipulate  our  raw  data,  the  final  anafysis  has  been  quite  consistent. 


With  respect  to  the  cell  and  tissue  based  work,  we  can  simplify  the  analysis  by  normalizing  the  response 
from  each  SL  responding  to  each  different  sample  type  (e.g.  divide  the  response  from  each  SL  by  the 
brightest  measurement  from  all  of  the  SLs  responding  to  one  cell  line  or  tissue  sample).  This  approach 
removes  any  labeling  variability  between  samples  of  the  same  type  as  each  preparation  would  be  considered 
a  unique  sample  at  this  point,  while  also  addressing  instrument  variation  (as  long  as  samples  are  run  at  the 
same  time)  and  user  variability.  Even  with  this  simplified  analysis  we  are  still  searching  for  the  optimal 
reference  method  to  most  accuratefy  address  all  of  our  sources  of  variability  while  preserving  the  integrity 
of  the  SL  Array  pattern  and  maximizing  the  SL  Array 
response. 

As  a  final  element  in  these  detailed  investigations,  we 
have  just  begun  to  investigate  the  iirpact  of  the  dye 
charge  and  structure  by  changing  from  fluorescein  and 
rhodamine  to  a  pair  of  Alexaflour  dyes  (Figure  16). 

These  Alexafluor  dyes  contain  the  common  xanthene  core 
like  fluorescein  and  rhodamine.  However,  they  both 
contain  ammonium  and  sulfonate  groups  that  promote 
water  solubility  and  while  also  reducing  the  overall  charge  disparity 
that  we  see  when  coirparing  binding  between  our  SLs  and  fluorescein 
and  rhodamine  labeled  glycoproteins. 

Task  3  b):  Upon  identifying  >5  selective  and  cross-reactive  SLs  (Task  1), 

we  will  assemble  them,  and  others  identified  in  Task  2,  into  an  array- 

based  diagnostic  format.  (Months  1-36) 

Our  main  focus  in  this  area  has  been  to  develop  a  more  “user- 
fiiendly”  platform  for  acquiring  and  analyzing  our  SL  Array  results. 

While  using  fluorescence  microscopy  has  afforded  excellent  results,  it 
is  a  labor  intensive  and  time  consuming  process.  Therefore, 
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microtiter  plate  based  assays  were  tested  as  a  possible  new 
array  format.  Two  designs  were  studied.  In  the  first,  SLs 
biotinylated  at  the  C- terminus  were  attached  to  neutravidin 
coated  plates  and  fluorescein- labeled  analytes  were  bound  to 
the  SLs  on  the  plate.  In  the  second  approach  ELISA  plates 
were  coated  with  unlabeled  anafyte  and  fluorescein-modified 
SLs  were  allowed  to  bind  to  the  analyte  immobilized  in  the 
plate  wells.  Either  of  these  formats  would  have  allowed  for  a 
plate- based  fluorescence  assay  that  could  be  read  with  any 
fluorescence  plate  reader. 

In  both  cases  SLs  bound  to  the  proof- of- concept 
glycoprotein  analytes  (BSA,  BSM,  OVA,  PSM)  and  retained 
selectivity  trends  similar  to  those  observed  for  SLs  on  beads. 

However,  the  alfinity  of  the  SLs  for  the  analyte  was 
markedly  lower.  As  a  result,  very  weak  intensity  readings 
were  obtained  that  did  not  aflbrd  a  large  enough  dynamic 
range  to  obtain  intensity  measurements  ifom  both  strong  and 
weak  binders.  Additionally,  the  variability  within  the  assay 
was  quite  large  (up  to  50%)  making  pattern  analysis  nearly 
impossible.  Still,  given  the  encouraging  results  obtained 
using  sinple  monovalent  SLs,  we  plan  to  revisit  this  concept 
again  once  we  have  polyvalent  SLs  available. 

As  we  continue  to  evaluate  alternate  assay  formats,  we  turned  to  a  more  high-throughput  method  of 
collecting  and  analyzing  fluorescence  data  associated  with  SL-glycoprotein  binding,  namely  flow 
cytometry,  as  a  means  to  read  out  the  fluorescence  intensity  derived  from  labeled  analytes  binding  with 
resin-bound  SLs.  This  approach  has  the  added  benefit  of  allowing  SL  synthesis  and  glycoprotein  binding  to 
be  carried  out  on  beads,  thereby  maintaining  our  desired  polyvalency  and  the  resulting  binding 
characteristics  of  the  beads.  Briefly,  synthesis  of  SLs  followed  the  same  procedure  as  described  above 
differing  onfy  in  that  10  pm  mono- disperse  TentaGel  beads  were  used  to  adhere  to  the  particle  size  limits 
for  analysis  on  the  available  Flow  Cytometer  (BD  LSR  II).  SLs  synthesized  on  these  beads  had  binding 
profiles  similar  to  those  synthesized  on  larger  300  pm  beads  as  assessed  using  flow  cytometry  (Figure  17). 

Evaluation  of  cell  membrane  extracts  has  focused  on  colon  and  prostate  derived  cell  lines.  Four  human 
colon  cell  lines  were  chosen  for  analysis;  CCD  841  CoN  (Healthy);  HCT116  and  HT29  (cancerous  non¬ 
metastatic);  LOVO  (Cancerous  metastatic).  SLl-9  were  bound  individual^  to  fluorescein- labeled  cell 
membrane  extracts.  Unbound  protein  was  washed  away  and  beads  were  passed  through  a  BD  LSR  II  flow 
cytometer.  Individual  intensity  readings  were  recorded  for  each  bead  within  a  sample,  extraction  of  this 
type  of  data  resulted  in  the  ability  to  acquire  intensity  values  from  hundreds  of  beads.  Outliers  were  rejected 
at  1.8  interquartile  distances  (IQDs)  and  intensity  readings  were  normalized  to  one  using  the  brightest 
reading  for  each  cell  line.  To  coirplement  existing  colon  cancer  data,  Linear  Discriminant  Analysis  (LDA) 
was  carried  out  and  classification  accuracies  determined  for  discrimination  between  healthy,  cancerous  non¬ 
metastatic  and  cancerous  metastatic  cell  lines  based  on  leave-one-out  cross-validation.  Compared  to  93% 
classification  accuracy  obtained  for  data  acquired  from  microscope  images  of  large  beads,  the  classification 
accuracy  from  flow  cytometry  data  was  only  73%. 
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Figure  17.  Conparison  of  the  binding 
selectivity  for  SLl,  3,  4  and  5  to  four 
glycoproteins  OVA,  BSM,  PSM  and 
BSA  when  data  is  collected  by  A) 
Microscopy  or  B)  Flow  Cytometry. 
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Focusing  on  prostate  derived  cell  lines,  six  human  prostate  cell  lines  were  analyzed;  RWPE-1  (healthy); 
WPE1-NA22  and  WPE1-NB14  (cancerous  non- metastatic);  LNCAP,  DU145  and  PC-3  (cancerous 
metastatic).  Based  on  EDA  of  flow  cytometry  data  prostate  cancer  cell  lines  classified  with  81%  accuracy 
(Figure  18)  compared  to  100%  Irom  microscope  images  for  this  three  class  modeL  When  simply  comparing 
healthy  or  cancerous  sanples,  using  flow  cytometry,  our  SL  Array  predicts  sample  type  with  97%  accuracy. 
In  comparison  to  microscope- based  collection  of  data  and  analysis,  this  method  allows  a  comprehensive  and 
unbiased  approach.  While  these  results  are  indeed  exceptional  for  this  type  of  analysis,  the  outcomes  are 
obviously  not  as  robust  as  the  microscopy-based  analysis  and  progress  needs  to  be  made  if  this  design  is  to 
compete.  One  area  to  be  worked  on  more  is  in  how  we  apply  boundaries  to  our  raw  data.  The  sheer  amount 
of  data  obtained  firom  using  flow  cytometry  can  be  overwhelming  and  more  detailed  analysis  of  the  quality 
of  these  data  sets  needs  to  be  carried  out. 
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While  we  continue  to  improve  the  efficacy  of  our  SL  Array  we  are  also  looking  to  enhance  not  only  the 
user  interface  (as  described  above)  but  to  also  expand  the  assay  utility  by  working  with  clinically  relevant 
and  less  invasively  collected  samples  (Task  3  d  includes  a  discussion  of  using  our  SL  Array  to  assess 
human  tissue  samples  based  on  metastatic  potential).  In  an  eflbrt  to  move  towards  serum-based  analysis,  we 
have  begun  to  study  the  secreted  glycoproteins  found  in  the  media  from  cultured  cell  lines.  Four  human 
colon  cell  lines  were  chosen  for  analysis;  CCD  841  CoN  (Healthy);  HCT116  and  HT29  (cancerous  non¬ 
metastatic);  LOVO  (Cancerous  metastatic).  We 
have  therefore  taken  secreted  glycoproteins 
isolated  from  cell  culture  media  as  well  as 
membrane  extracts  from  the  cells  taken  from  the 
exact  same  media.  Furthermore,  we  have 
evaluated  our  SL  Array  response  towards  media 
containing  fetal  bovine  serum  (FBS)  as  well  as 
starving  the  cells  of  FBS  for  48  hours  prior  to 
harvesting  the  secreted  and  membrane 
gfycoproteins.  Briefly,  membrane  extracts  were 
isolated  and  labeled  with  FITC  as  previously 
described.  Cell  culture  media  containing  the 
secreted  glycoproteins  was  concentrated  using 
ultra  centrifugation.  The  secreted  proteins  were 
then  precipitated  into  acetone,  centrifuged,  and  the 
supernatant  removed.  The  pellet  was  washed  once 
with  acetone  and  the  previous  step  repeated.  The 
cell  pellets  were  re- suspended  in  buffer  and 
labeled  with  FITC  without  any  further  purification. 

Figure  19  shows  the  2D  plot  of  the  LDA 
results  from  the  analysis  of  secreted  (Figure  19 A) 
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and  membrane  extracts  (Figure  19B)  for  each  of  these  four 
human  colon  cell  lines,  using  SLl-9  binding  to  fluorescein 
labeled  glycoproteins  to  generate  response  patterns.  Notice 
that  in  each  plot,  the  data  clusters  into  three  groups 
(normal/healthy,  cancerous  non-metastatic,  cancerous 
metastatic)  with  virtually  no  overlap.  Cross-validation  of  each  model  indicates  classification  accuracies  of 
100%  for  the  secreted  and  96%  for  the  membrane  extracted  glycoproteins.  Most  excitingly,  when  these  two 


Figure  18.  2D  LDA  plot  of  flow 
cytometry  data  from  the  SL  Array 
responding  to  six  prostate  cancer  cell 
lines,  classified  with  81%  accuracy. 
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sets  of  data  are  combined  and  modeled  together  (Figure  19C) 
we  see  excellent  overlap,  with  an  overall  classification  accuracy 
of  92%;  suggesting  that  there  is  a  high  correlation  between  the 
amounts  and  types  of  glycoproteins  secreted  and  those  integral 
to  the  cell  membrane.  In  addition,  whether  we  starve  the  cells 
of  FBS  or  not,  even  with  all  of  the  different  bovine  proteins  and 
glycoproteins  present  in  the  sample,  makes  little  difference  in 
the  statistical  analysis.  All  of  this  taken  together  provides 
further  support  for  the  use  of  our  SL  Array  to  assess  secreted 
glycoproteins  from  clinical  samples. 

Task  3  c)\  Evaluate  the  abiUtv  of  the  array  to  discriminate  complex 

glycans  (ie.,  TF  antigen,  Le^,  Le^,  sLe^,  sLe’‘).  Note  that 

because  the  development  of  the  arrays  will  be  continuallv 

evolving,  as  we  identify  new  and  more  selective  SLs,  the  time 

frame  for  this  task  is  the  entire  proposal  period.  (Months  1-36) 

As  an  initial  test  of  our  approach  towards  binding 
biologically  relevant  targets,  we  used  an  array  of  SLl,  SLS,  SL4 
and  SLS  to  distinguish  between  five  structurally  similar  cancer 
associated  glycans  (TF  antigen,  Le^,  Le’^,  sLe^  and  sLe’^;  Figure 
2A).  These  glycans  were  chosen  because  they  represent  souk 
of  the  more  common  saccharide  motifs  overexpressed  by 
cancerous  cells  as  well  as  being  coirposed  of  many  of  the  same 
monosaccharides  that  were  found  on  our  proof- of- concept 
glycoproteins  (OVE,  PSM,  BSM).  SL2  was  not  included  in  the 
array  based  on  the  assumption  that  we  could  eliminate 
redundancy  due  to  response  similarities  with  SLS  and  because 
of  the  high  background  binding  to  BSA  as  compared  with  SLS. 

Briefly,  after  incubating  each  SL  with  a  solution  containing 
biotinylated  glycan  and  fluorescent^  labeled  streptavidin, 
luminosity  values,  from  fluorescence  microscope  images,  were  analyzed  (4  SLs  by  5  glycans  by  15 

replicates).  To  account  for  differences  in  bead  size  and  loading  levels,  luminosities  were  normalized  against 

the  highest  luminosity  within  a  given  SL  type  (in  this  study  the  greatest  degree  of  variability  stems  from 
bead-to-bead  variations).  The  unique  pattern  generated  for  each  different  glycan  based  on  the  response  of 
the  four  different  SLs  is  shown  in  Figure  20 A.  Note  that  the  response  for  each  glycan  produces  unique  and 
distinguishable  patterns  that  are  reproducible  within  the  limits  of  the  associated  error. 

To  interpret  patterns  that  display  subtle  differences,  statistical  analyses  were  used  to  identify  the  most 
significant  features  necessary  for  classification  of  the  analytes,  specifically,  linear  discriminant  anafysis 
(LDA).  From  this  analysis.  Discriminant  1  and  Discriminant  2  contain  83.3%  and  14.8%  of  the  between 
group  variation,  respectively  (Figure  20B).  Note  that  the  different  glycans  are  clustered  into  five  groups 
with  an  average  standard  deviation  of  6%.  Furthermore,  the  Wilks’  lambda  value  for  this  analysis  is  0.009 
with  a  p-tail  value  of  <0.000001,  indicating  that  there  is  a  statistically  significant  difference  in  the 
population  means  from  this  analysis  at  the  95%  level  of  confidence.  Based  on  leave-one-out  cross- 

validation  the  SL  array  correctfy  classified  71  of  the  75  measured  samples  (94.7%  classification  accuracy, 

with  a  chance  accuracy  of  only  20%).  Significantly,  the  Lewis  antigens  and  their  sklylated  forms  (Le^^/Le’^ 
and  sLefysLe’^)  were  efliciently  discriminated  while  onfy  differing  by  the  addition  of  a  terminal  sialic  acid 
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Figure  20.  Differentiation  of  5 
glycans  using  a  SL- array.  A.) 
Fingerprint  pattern  of  the  average 
normalized  luminosity  intensities  from 
SLl,  SL3,  SL4,  and  SL5  responding  to 
5  different  glycans  (TF,  Le®,  Le’‘,  sLe® 
and  sLe’').  B.)  The  2-dimensional 

LDA  score  plot  derived  from  the 
patterns  shown  in  (A.)  for  15 
replicates.  Ellipses  indicate  95% 
confidence  level,  analyte  identification 
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moiety.  Additionally,  this  SL-Airay  inpressively  distinguished  between  Le^*  and  Le’^,  as  well  as  between 
sLe^  and  sLe’^,  glycans  where  the  only  structural  difference  is  the  regiochemistry  of  the  linkage  to  the  core 
GlcNAc  moiety  (Figure  2 A).  Of  the  four  rrrisclassilied  glycans,  Le^  was  twice  identified  as  sLe^,  sLe^  was 
once  classified  as  Le^,  and  Le’^  was  once  recognized  as  sLe^. 

To  further  validate  the  utility  of  our  SL  Array  for  discriminating  these  five  structurally  similar  glycans, 
the  more  statistically  robust  “boot- strapping”  approach  was  used.  Fifty  separate  and  unique  data  sets  were 
generated  using  the  Mersenne-Twister  random  number 
generator.  Overall,  this  analysis  yielded  94%  classification 
accuracy  correctly  classifying  individual  glycans  from  86- 
99%.  As  with  the  leave-one-out  analysis,  the  three  greatest 
rrrisclassifications  were  due  to  Le^*  being  rrrisclassified  as  sLe® 

(9.3%),  sLe^  being  misclassified  as  Le^  (6.7%),  and  Le’^  being 
misclassified  as  sLe®  (4.7%).  Still  further  stressing  the  limits 
of  this  array  for  differentiating  glycans,  training  and  test  sets 
were  chosen  at  random  from  the  Normal  distribution,  splitting 
our  data  in  half.  One  half  was  used  as  a  training  set,  to  create 
a  statistical  model,  and  the  other  half  as  a  test  set  to  assess  the 
ability  of  this  model  to  accurately  identify  these  “unknowns.” 

Random  set  generation  and  subsequent  analyses  were  carried 
out  25  times  to  create  replicates  in  order  to  minimize 
systematic  error.  Consistent  with  the  previously  described 
analyses,  the  overall  classification  accuracy  of  this  approach 
was  94%.  The  consistency  displayed  across  the  three  methods 
further  testifies  to  the  strength  of  the  outlined  SL  Array  design 
for  discriminating  structurally  similar  cancer  associated 
glycans. 

We  next  generated  a  lectin  array  containing  19  synthetic 
lectin  (SLs)  and  investigated  the  binding  affinity  of  each  with 
six  different  glycans  for  which  expression  is  commonly 
altered  during  prostatic  and  colorectal  cancers.  This  was  done 
primarily  to  identify  SLs  selective  towards  certain  glycans  and 
then  to  further  conprehend  the  chemical  basis  of  the 
selectivity.  To  ensure  that  SLs  bind  to  glycans  and  not  the 
fluorescent  dye  and  to  maintain  the  dye:glycan  ratio,  we  made 
use  of  biotin  tagged  CAGs.  After  incubation  with  the  CAGs, 
fluorescently  tagged  streptavidin  was  introduced  later  to  introduce  an  optical  signal  upon  conjugation  of 
biotin  to  streptavidin.  Figure  21A  shows  increased  binding  of  SL5-RR  over  SL5  with  sialylated  Le^*.  This 
is  an  example  of  how  the  SL-glycan  interaction  is  enhanced  upon  the  introduction  of  additional  positively 
charged  arginine  residues  (R)  in  the  sequence  of  SL5-RR,  thereby  leading  to  a  stronger  charge-pairing 
interaction  with  sLe^  over  non- sialylated  Le®.  Figure  2 IB  displays  selective  binding  of  SLll  over  18  other 
SLs  to  non- reducing  fucose.  This  could  be  attributed  to  extra  boronic  acid  present  on  the  free  N- terminus  of 
SLll,  thus  assisting  in  fucose  binding.  SLll  also  contains  four  phenyl  rings  that  could  contribute  to  CH-ti 
type  interactions  with  fucose. 

To  investigate  the  cross- reactivity  of  SLs  and  their  applicability  to  distinguish  all  6  CAGS  we 
constructed  an  array,  thus  fostering  the  hypothesis  of  SL-gfycan  interactions  and  boronic  acid-diol  binding. 


Figure  21:  Selective  binding  of  Synthetic 
Lectins  (SL).  A.  output  of  SL5  and  SL5- 
RR  with  cancer  associated  glycans 
(CAGs)  like  sLea  and  its  non- sialylated 
counterpart  Lea,  illustrating  that  extra 
Arginine  residues  (-1-  charged  amino 
acids)  at  the  N-terminus  of  SL-RR  help 
in  binding  to  sialic  acids  in  sLea.  B. 
showing  SLl  1  selectively  binding  to 
fucose,  illustrating  boronic  acid  on  N- 
terminus  of  SLl  1  assisted  fucose 
interaction  (4  phenyl  rings  in  SLl  1 
peptide  sequence  assist  hydrophobic 
interactions  to  non- polar  fucose  residue. 
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Depicted  in  Figure  22A  is  the  output  from  when 
we  employed  a  guided  statistical  approach,  LDA, 
to  accurately  discern  these  6  CAGs  with  >99% 
classification  accuracy.  It  is  noteworthy  that 
sialykted  CAGs  (sLe^  and  sLe’')  and  their  non- 
sialylated  counterparts  (Le®  and  Le’^)  are  close  to 
each  other  in  these  models.  Le^  is  also  closely 
situated  to  sLe^*.  These  signify  the  capability  of 
the  array  to  discern  sialyated  from  non-siafylated 
as  well  as  small  structural  diflferences  between 
sLe^  and  sLe’'.  TF- antigen  (TFA)  is  a 
disaccharide  and  does  not  possess  the  same 
glycan  motif  shared  by  the  other  CAGs  involved 
in  this  study,  hence  it  is  uniquely  classified.  The 
two  SLs  which  contributed  the  most  to  this 
classification  were  SL7  and  SL2-Dab.  We 
hypothesize  that  these  two  SLs  dominate, 
primarily  because  of  greater  number  of  aromatic 
rings  in  SL7  (CH-;i  interactions)  and  the  greater 
amount  of  positive  charges  in  SL2-Dab  (ionic 
interactions). 

An  LDA  model  with  SLs  containing  a  greater 
number  of  aromatic  rings  only  (and  low  positive 
charges),  similar  to  SL7,  resulted  in  a  decrease  in 
the  tightness  of  model  (Figure  22B),  indicating  a 
loss  in  precision  and  conpromising  the 
classification  accuracy,  reducing  it  to  83%. 

Similarly,  a  LDA  model  with  SLs  having  a  greater  number  of  positive  charges  and  no  boronic  acids  (similar 
to  SL2-Dab)  shows  a  reduction  in  classification  accuracy  to  93%  (Figure  22C).  These  results  suggest  that 
boronic  acid  residues  were  inportant  to  discriminate  sLe®,  sLe’^  and  Le^*.  In  addition,  the  model  derived 
from  positivefy  charged  SLs  retains  excellent  ‘tightness/precision’,  indicating  that  these  positivefy  charged 
SLs  provide  a  microenvironnKnt  necessary  for  accurate  binding.  The  unguided  statistical  models  offer 
similar  insight,  indicating  that  positively  charged  amino  acids  are  important  for  precision  and  boronic  acids 
are  important  for  accuracy. 

As  indicated  above,  it  is  possible  that  the  SLs  interact,  not  only  with  the  glycan,  but  also  with  the  protein 
portion  of  glycoproteins.  In  this  analysis  the  protein  component,  FFTC-streptavidin,  is  the  same  for  each 
glycan  being  analyzed.  As  such,  any  observed  difference  in  the  response  from  the  array  must  be  attributed 
to  the  glycan  constituent.  Given  the  structural  similarities  between  these  gfycans,  it  is  remarkable  that  there 
were  not  more  misclassifications.  In  total,  these  results  validate  our  ability  to  differentiate  stmcturaHy 
similar  cancer  associated  glycans  with  high  accuracy  using  a  small,  cross- reactive  SL  Array. 

Task  3  d):  Evaluate  the  abilitv  of  the  array  to  discriminate  prostate  cancer  cell  lines  (Le.  PC-3,  LNCaP,  and 

DU145),  as  well  as  RWPE-L  WPE1-NA22,  WPE1-NB14,  WPEI-NBIL  and  WPE1-NB26.  which  are 

referred  to  as  the  MNU  cell  lines,  all  available  from  the  ATCC.  Note  that  because  the  development  of  the 

arrays  will  be  continually  evolving,  as  we  identify  new  and  more  selective  SLs,  the  time  frame  for  this  task 

is  the  entire  proposal  period.  (Months  1-36) 


Figure  22:  guided  statistical  output  of  SL  array  with 
various  CAGs.  (TF)  antigen,  sialyl  Lewis  X  (sLe’') 
and  sialyl  Lewis  A  (sLe^)  are  some  CAGs  that  tend 
to  overexpress  in  colon  cancer.  Circles  reflect  95% 
confidence  intervals.  A.  shows  100%  classification 
of  all  CAGs  due  to  lectins  SL7  (high  no.  of  phenyl 
rings  and  low  -l-  charge)  and  SL2-Dab  (no  boronic 
acids  and  high  -i-  charge).  B.  shows  loss  of  accuracy 
to  83%  when  modeled  with  SLs  having  only  greater 
no.  of  phenyl  rings.  C.  shows  93%  classification 
with  SLs  not  containing  boronic  acids,  deeming 
boronic  acid  useful  to  discern  sLe^,  sLe’'  and  Le®. 
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The  vast  majority  of  our  work  to  date  in  developing  and 
working  with  arrays  has  focused  on  how  we  analyze  our  array 
data.  As  described  above,  we  have  improved  our  data 
collection  methods  to  obtain  better  consistency  between 
replicate  measurements  as  well  as  optimizing  how  intensity 
values  are  extracted. 

In  this  regard,  we  have  begun  to  evaluate  our  array  response 
using  color  space  intensities  and  not  just  luminosity.  In 
particular,  we  have  focused  on  the  popular  “Red- Green- Blue” 

(RGB)  color  space  to  obtain  more  of  a  full  spectral  response 
from  our  array.  In  so  doing  we  have  improved  our 

classification  accuracy  from  97%  to  100%  for  a  five  cell  line 
panel  (including:  HT-29,  CT-26,  CT-26-F1,  CT-26-FL3,  and 
3T3/NIH)  made  up  of  114  replicates  we  often  use  to  evaluate 
our  models. 

To  further  validate  our  approach,  we  have  assessed  the  ability  of  our  array  to  identify  analytes  which  it 
has  never  seen  before.  Specifically,  we  used  ten  cell  lines  including  a  mix  of  mouse  and  human  lines  as 
wen  as  colon  (7  -  3T3,  HT29,  HCT116,  CT26,  CT26-F1,  CT26-FL3,  and  Lovo),  breast  (2  -  MCF7  and 
MCFIOA)  and  prostate  (1  -  PC3)  cefi  lines.  To  do  this  we  create  a  statistical  model  based  on  9  ceH  lines 
wlfile  leaving  data  from  one  cefi  line  out  and  then  attempt  to  classify  this  excluded  line,  in  much  the  same 
way  that  a  diagnostic  test  must  determine  the  disease  status  for  a  patient  that  did  not  contribute  to  the 
cafibration  data  set.  As  such,  when  classify^g  our  sanples  as  healthy,  cancerous/non- metastatic  or 
cancerous/metastatic  we  only  obtained  56%  overaH  classification  accuracy  (Figure  23,  blue).  However,  if 
we  simply  look  to  diagnose  the  cancer  and  not  stage  at  the  same  time,  thereby  identifying  our  data  as  either 
healthy  or  cancerous,  we  improve  our  overall  classification  accuracy  to  just  over  83%  (Figure  23,  green). 
Still,  by  ignoring  the  3T3/NIH  mouse  fibroblast  line,  the  most  out  of  place  cell  line  in  this  analysis,  and 
looking  at  the  remaining  nine  cell  lines  using  this  same  approach,  we  can  “diagnose”  the  presence  of  cancer 
100%  of  the  time,  with  a  sample  set  of  n  =  434. 

Finally,  we  realize  that  using  linear  discriminant  analysis  (LDA)  is  not  necessarily  the  best  approach  for 
analyzing  our  data.  We  also  recognize  that  not  all  samples  can  be  controlled  as  tightly  as  ours  have  been 
previously.  As  such  we  evaluated  our  complete  data  set  derived  from  colon  cancer  cell  lines,  including 
variations  in  incubation  time  (1  h  to  24  h),  incubation  temperature  (4  °C,  25  °C  and  37  °C),  and  sample 
dilution  (20x,  50x  and  lOOx).  In  total  this  aflbrded  nearly  12,000  measurements  leading  to  3000  diflerent 
fingerprints  from  our  SL  Array.  Using  support  vector  machines  we  were  able  to  obtain  93%  classification 
accuracy  and  using  regression  tree  analysis  we  improved  the  classification  accuracy  to  97%.  Working 
closely  with  Prof  Edsel  Pena  in  the  Department  of  Statistics  at  the  University  of  South  Carolina  we  are 
continuing  to  explore  our  options,  being  cautious  that  the  approach  we  take  is  appropriate  for  the  type  of 
analysis  we  are  doing  as  well  as  verifying  that  we  do  not  “over-train”  our  models  and  that  we  maintain 
statistical  validity. 

As  part  of  our  focus  during  the  final  years  of  funding  on  this  project  we  aimed  to  expand  our  previous 
work,  which  focused  primarily  on  colon  cancer  related  cell  lines,  to  include  prostate  cancer  related  samples 
as  well  Furthermore,  existing  data  analyses  from  ten  colon  cancer  cell  lines  using  our  SL  Array  relied 
heavily  on  murine  cell  lines  (nearfy  half);  and  we  desired  to  focus  this  aspect  of  our  analyses  on  human 
derived  cell  lines  only.  Therefore,  we  expanded  our  analysis  to  include  15  human  cell  lines  consisting  of 
four  colon  derived  cell  lines  (HCT  116,  LoVo,  HT-29,  CCD  841  CoN),  five  breast  derived  cell  lines 
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Figure  23.  Individual  classification 
accuracies  (3-class  groupings  in  blue, 
2- class  groupings  in  green)  for  each  of 
10  cell  lines  derived  from  a  model 
lacking  all  input  from  that  line. 


(MCFIOA,  MCF7,  MDA-MB-231,  BT474,  D47T)  and  six  prostate  derived  cell  lines  (LNCaP,  DU145,  PCS, 
RWPE-1,  WPE1-NA22,  WPE1-NB14).  Initially,  a  healthy  human  colon  cell  line  was  tested  in  place  of  NIH 
3T3s,  a  murine  fibroblast  cell  line,  as  part  of  a  colon  cancer  model  for  determining  metastatic  potential  The 
addition  of  the  human  normal  colon  cell  line  (CCD  841  CoN)  in  place  of  a  mouse  cell  line  (NIH  3T3) 
inproved  overall  classification  accuracies.  Known  tissue  specific  differences  in  glycosyktion  led  to  the 
addition  of  tissue- specific  normal  cell  lines,  specifically,  RWPE-1,  a  healthy  prostate  cell  line. 

When  examining  data  from  all  cell  lines,  the  removal  of  murine-based  cell  line  data  as  well  as  the 
increase  in  the  number  of  human  cell  line  data  resulted  in  the  ability  to  classify  cell  lines  as  healthy  vs. 
cancerous  with  79%  accuracy.  A  significant  improvement  was  seen  when  classifying  samples  as  healthy, 
cancerous/non- metastatic  or  cancerous/metastatic  for  the  human  only  cell  lines  with  a  classification 
accuracy  of  76%,  as  compared  to  the  model  which  included  the  murine  cell  line  data,  having  a  classification 
accuracy  of  only  56%. 

However,  since  tissue  specific  differences  in  glycosylation  have  been  shown  to  occur,  we  evaluated  the 
ability  of  our  array  to  distinguish  these  cell  lines  based  on  tissue  type.  The  top  pane  in  Eigure  24  shows  the 
two-dimensional  data  spread  from  using  EDA  to  analyze  13  different  cell  lines  based  on  tissue  type  (four 
colon  derived  cell  lines  (HCT  116,  LoVo,  HT-29,  CCD  841  CoN),  three  breast  derived  cell  lines  (MCEIOA, 
MCF7,  MDA-MB-231)  and  six  prostate  derived  cell  lines  (RWPE-1,  WPE1-NA22,  WPE1-NB14,  LNCAP, 
DU145,  PC-3),  achieving  90%  classification  accuracy  based  on  leave-one-out  cross- vaMatioa  Using  LDA, 
this  same  type  of  analysis,  was  run  on  each  tissue  type  cluster  identified  in  the  previous  classification 
process  to  evaluate  these  subsets  for  differing  metastatic  potential  Independently,  each  tissue  type 
classified  excellently,  98%  for  breast,  93%  for  colon  and  100%  for  prostate  based  on  the  3-class 
identification  paradigm  of  normal  (healthy),  cancerous  non- metastatic  or  cancerous  metastatic.  Overall  this 
afforded  between  84%  to  90%  classification  accuracy,  compared  to  76%  when  not  including  tissue  type  in 
the  analysis. 
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Based  on  our  particular 
interest  in  prostate  cancer, 
we  looked  more  closely  at 
the  data  from  only  the 
prostate-derived  cell  lines. 

Briefly,  the  RWPE- 1  cell 
line  is  also  the  parent  cell 
line  to  a  series  of  cell  lines 
transformed  by  exposure  to 
N  -  methyl-  N  -  nitro  sourea 
(MNU).  These  cell  lines  are 
representative  of  a 

progression  from  normal 
cells  to  cancerous/metastatic 
cells.  To  the  best  of  our 
knowledge,  there  are  no 
low-/non-metastatic  prostate 
cancer  cell  lines  isolated 
from  patient  tissue  which  are 
commercially  available. 

Therefore,  of  the  four 

original  NMU  cell  lines 

WPE1-NA22  and  WPEl- 
NB14  were  first  analyzed  as 
part  of  our  prostate  cancer 
model  to  assess  metastatic 
potential  because  these  are 
the  closest  to  healthy,  and  thus  representative  of  cancerous  non- metastatic  cells. 
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Figure  24.  2D  EDA  score  plot  of  microscope- obtained  data  from  the  SL 
Array  (i.e.  SLl-9)  responding  to  membrane  extracted  glycoproteins  from 
13  different  cell  lines  based  on  tissue  type;  with  a  classification  accuracies 
of  90%.  The  bottom  panes  depict  the  LDA  scores  plots  from  each  tissue 
type  cluster  identified  in  the  previous  analysis  (top  pane).  Classification 
based  on  the  3-class  identification  paradigm  of  normal  (healthy), 
cancerous  non- metastatic  or  cancerous  metastatic  resulted  in  classification 
accuracies  of  98%  for  breast,  93%  for  colon  and  100%  for  prostate 


Using  LDA  we  were  able  to  show  that  our  SL  Array  can  distinguish  between  healthy  (RWPE-1). 
cancerous/non- metastatic  (WPE1-NA22,  WPE1-NB14)  and  cancerous/metastatic  cell  lines  (LNCAP. 


DU145,  PC-3)  with  100%  accuracy  (Figure  25). 
Here,  we  see  that  there  is  an  obvious  trend  in  the 
clustering  based  on  metastatic  potential  regardless 
of  cell  line.  For  exanple,  and  perhaps  not  so 
surprising  given  that  they  are  isogenic  cell  lines, 
the  WPE1-NA22  and  WPE1-NB14  cluster  tightly 
together  (red  squares  and  green  triangles.  Figure 
25);  however,  note  that  the  RWPE-1  parent  cell 
line  (blue  diamonds.  Figure  25)  are  clearly 
separated  from  these  tumorigenic  cell  lines.  More 
notable  is  how  the  data  from  the  LNCAP,  DU145 
and  PC -3  cell  lines  cluster  together  (purple  X, 
turquoise  asterisks  and  orange  circles.  Figure  25), 
thereby  indicating  strong  similarities  in  the 
glycosylation  patterns  of  these  more  aggressive  cell 


obtained  data  from  the  SL  Array  (Le.  SLl-9) 
responding  to  membrane  extracted  glycoproteins 
from  4  prostate  derived  cell  lines;  distinguishing 
normal/healthy  (RWPE-1),  cancerous  non- 
nKtastatic  (WPE1-NA22  and  WPE1-NB14)  and 
cancerous  n^tastatic  (LNCAP,  DU145  and  PC-3) 
groups  of  cell  lines  with  100%  accuracy. 
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lines,  overall  suggesting  a  high  correlation  between  array 
response  and  metastatic  potential 

In  addition  to  increasing  the  number  of  cell  lines  used  as 
analytes,  the  number  of  SLs  used  as  part  of  the  array  has  also 
been  increased.  SL2  and  SLs  6-9  were  initially  discounted  as 
part  of  an  array  due  to  low  selectivity  or  repetition  of 
selectivity.  However,  since  tissue  specific  differences  in 
glycosylation  have  been  shown  to  occur,  all  SLs  were 
included  in  the  SL  Array  to  determine  if  they  provided 
greater  accuracy  in  determining  the  metastatic  potential  of 
prostate  cancer.  When  our  original  SL  Array,  including  SLl, 

3,  4  and  5,  was  used  to  evaluate  the  six  prostate  cancer  cell 
lines  based  on  metastatic  potential  the  array  was  able  to 
distinguish  between  healthy,  cancerous  non- metastatic  and 
cancerous  metastatic  cell  lines  with  93%  accuracy  (Figure 
26A).  When  the  same  cell  lines  are  assessed  using  the  nine 
membered  SL  Array  (SLs  1-9)  the  accuracy  increases  to 
100%  (Figure  26B).  This  data  suggests  that  while  the 
individual  SLs  binding  selectivity’s  may  not  be  greatly 
different  with  respect  to  OVA,  BSM  and/or  PSM,  the 
inclusion  of  these  differential  binding  SLs  in  the  array 
provides  incremental  information  for  discriminating  cell  lines 
of  differing  metastatic  potential  For  example,  and  as  described  above,  SL2  was  previously  excluded  from 
our  SL  Array  because  of  the  similarities  in  response  with  SL3  to  purified  glycoproteins  as  well  as  noting  the 
high  BSA,  background  binding  in  SL2;  still  in  the  current  analysis,  SL2  accounted  for  25%  of  the  variance 
in  the  array  that  could  discriminate  prostate  derived  cell  lines  with  100%  accuracy  (ie.,  25%  of  the 
discriminatory  ability  of  the  array  was  provided  by  SL2). 
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Figure  26.  2D  LDA  score  plots  of 
microscope-obtained  data  from  different 
SL  Arrays  responding  to  membrane 
extracted  gfycoproteins  from  six  prostate 
derived  cell  lines  (RWPE-1,  WPEl- 
NA22,  WPE1-NB14,  LNCAP,  DU145, 
PC-3).  The  plots  derive  from  using  SLs 
1,  3,  4  and  5  (A)  and  SLs  1-9  (B)  and 
provide  93%  and  100%  classification 
accuracy  based  on  the  3 -class 
identification  paradigm  of  normal 
(healthy),  cancerous  non- metastatic  or 
cancerous  metastatic. 


In  addition,  array  diversity  was  expanded  by  including  the  charged  arginine  mutants  discussed  Task  3  a. 
In  evaluating  the  same  four  human  colon  derived  cell  lines,  using  SLl -5  along  with  SL5-Dab,  SL5-RR,  Ac- 
(RAA)3  and  Ac-(RA)4  in  different  combinations  provided  classification  accuracies  between  95%-100% 
(Figure  27).  While  using  SLl -9  consistently  produced  100%  classification  for  this  group  of  cell  lines,  it  is 
interesting  to  note  that  SL4,  SL5-Dab  and  SL5-RR  accounted  for 
nearly  81%  of  the  diversity  captured  by  this  array,  suggesting 
charge  is  inportant  in  certain  analyses  and  adding  to  our  array 
capabilities. 

cell  lines 


While  the  current  array 
classification  accuracies 
continuing  to  improve  by 
refining  the  SLs  in  the 
array  and  improved 
statistical  modeling,  this  is 
not  our  ultimate  goal  Cell 
lines  do  serve  as 
acceptable  in  vitro  models 
for  cancer,  however  they 
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Figure  28:  depicting  work-flow  of  incubating  SL  library  (2  or  3BA)  with 


do  not  always  represent  the  corrplexity  of  the  tumor  microenvironment.  To  examine  whether  our  SL  Array 
could  work  with  clinical  specimens,  tissue  from  11  colon  cancer  patients  were  analyzed  using  the  SL  Array 
containing  SLl-9.  This  tissue  was  readily  available  from  the  Colon  Cancer  Research  Centre  Tissue 
Biorepository,  which  JJL  is  a  member.  Each  patient  tissue  sample  consisted  of  one  tumor  sample  and  one 
normal  or  healthy  sample  taken  from  an  adjacent  site. 

Briefly,  the  tissues  were  ground  in  liquid  nitrogen  and  the  resulting  powder  added  to  lysis  buffer. 
Membrane  proteins  were  extracted  using  the  Qiagen  membrane  extraction  kit,  labeled  with  FTTC  and 
incubated  with  SLsl-9.  Fluorescence  intensity  data  was  collected  for  each  sample  using  fluorescence 
microscopy.  Outliers  were  rejected  at  1.8  interquartile  distances  (IQDs)  and  intensity  readings  were 
normalized  to  one  using  the  brightest  reading  for  each  patient  sample.  Using  de-identified  patient  disease 
data,  LDA  analysis  was  carried  out  to  determine  the  ability  of  the  array  to  differentiate  patient  samples 
based  on  a  number  of  factors.  Of  greatest  importance  was  the  ability  of  the  array  to  tell  the  difference 
between  healthy  and  cancerous  tissues  and  to  accurately  stage  the  cancer.  Using  this  nine  SL  array  we  were 
able  to  distinguish  healthy  and  cancerous  samples  with  83%  accuracy  and  stage  the  cancer  with  91% 
accuracy.  Most  interestingly,  we  were  able  to  identify  patients  who  had  pre-adjuvant  chemotherapy  prior  to 
surgery  as  the  “normal/healthy”  tissue  samples  from  these  patients  more  closely  resembled  tumor  tissue 
samples  than  they  did  normal/healthy  tissue  from  patients  who  had  not  yet  received  any  chemotherapy. 
These  initial  results  demonstrate  that  our  array  not  only  discriminates  between  cell  types  effectively  in  in 
vitro  cell  line  models  but  also  in  tissue  samples.  This  promising  data  suggests  that  the  development  of  the 
array  for  clinical  utility  is  possible.  The  next  set  of  pressing  experiments  is  to  evaluate  tissue  samples  from 
prostate  cancer  patients  and  to  follow-up  with  serum  samples  from  similar  patients. 


Taking  advantage  of  the  dual- fluorescent  dye  competitive  binding  platform  we  screened  our  library  to 
discriminate  secreted  proteins  from  normal  and  metastatic  cell  lines.  One  advantage  to  using  secreted 
proteins  rather  than  membrane  extracts  is  in  minimizing  the  impact  on  native  protein  structures  and  also 
testing  the  ‘sensitivity’  of  SLs  (because  proteins  of  interest  may  be  in  low  concentration  in  the  secreted 
protein  mixture).  Proteins  concentrated  from  a  normal  prostate  cell  line  (RWPE-1)  were  labeled  green  using 
FITC  and  proteins  from  a  metastatic  non- androgenic,  non-PSA  expressive  cell  line  (DU145)  were  labeled 
red  with  rhodamine  (the  inverse  was  also  tested  to  ensure  that  the  charge  on  the  dye  did  not  influence 
binding;  similar  results  were  observed).  Competitive  binding  screens  between  these  differentially  labeled 
analytes  were  carried  out  with  two  libraries;  one  involving  an  extra  boronic  acid  on  the  N-terminus,  thus 
having  three  boronic  acid  residues  on  each  SL(3BA)  while  the  original  library  has  only  2  boronic  acids  and 
a  free  N-terrrrinus  (2BA).  The  3BA  library  consequently  has  one  extra  aromatic  ring  in  each  SL  sequence 
and  both  libraries  were  screened  using  secreted  proteins  from  prostate  and  colon  cell  lines.  Figure  28 
depicts  the  screening 
process  beginning  with 
incubation  of  both  libraries 
separately  with  dual  dye 
mix  of  normal  and 
metastatic  proteins.  After 
incubation,  the  library 
beads  were  imaged  under 
filters  and 
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Figure  29:  Binning  chart  representing  population  diversity  of  bead 
fluorescent  intensity  in  3BA  and  2BA  SL  libraries  with  A.  human  prostate 
metastatic  (DU145)  and  human  prostate  normal  cell  line  (RWPE-1). 


EA  adenocarcinoma  patients  pooi  I  AA  adenocarcinoma  patients  pooi 


or  cancerous  hits.  SL  hits  were  sequenced,  re- synthesized  and  studied.  It  was  observed  (Figure  29 A)  that 
the  3BA  library  had  greater  binding  diversity  with  prostate  cancer  cells  and  that  the  2BA  library  had  greater 
diversity  with  normal  prostate  and  colon  cell  lines  (Figure  29B).  We  hypothesize  this  is  due  in  part  to  the 
extra  aromatic  ring  in  the  3BA  library,  increasing  CH-ti  interactions  with  metastatic  prostate  glycoproteins 
(which  are  known  have  greater 
liicosylation  compare  to  normal  prostate 
cell  lines). 

Several  different  glycoprotein 
sources  were  screened  including  ones 
from  secreted  cell  lines  and  ones  from 
pooled  clinical  tissue  samples,  producing 
a  new  series  of  SL  hits  exhibiting  some 
interesting  trends.  For  exanple,  it  was 
observed  (Table  3)  that  SL  hits  for 
normal  prostate  and  colon  glycoproteins 
had  a  higher  ratio  of  polar  amino  acids 
e.g.  S,  T,  N  and  Q.  SL  hits  for 
metastatic  colon  cancer  had  a  greater 
number  of  arginine  residues  (R), 
whereas  SL  hits  for  metastatic  prostate 
cancer  had  a  greater  number  of  aromatic  rings  (from  amino  acids  like  F  and  Y  and/or  from  phenyl  boronic 
acid)  in  the  isolated  SL  sequences. 

Finally,  on  a  different  but  quite  exciting  note,  we  have  been  able  to  classify  triple  negative  breast  cancer 
(TNBC)  using  our  SL  Array  with  high  accuracy.  Since  we  have  demonstrated  the  ability  of  our  SL  Array  to 
distinguish  between  tissue  types  and  classify  cell  lines  and  tissue  samples  based  on  metastatic  potential;  it  is 
of  interest  to  determine  if  the  SL  Array  can  be  used  to  distinguish  between  different  molecular  subtypes  of 
cancer.  The  basis  of  our  current  research  using  our  SL  Array  is  that  during  the  progression  of  cancer,  from 
healthy  to  cancerous/metastatic,  glycosylation  proffles  change.  One  of  the  best- characterized  cancers  in 
terms  of  molecular  subtypes  is  breast  cancer.  It  can  be  broken  down  into  four  broad  subtypes.  Human 
Epidermal  growth  factor  Receptor  2  (HER2)  overexpressing.  Luminal  A,  Luminal  B  and  Triple  Negative, 
based  on  the  expression  levels  of  three  receptors;  HER2,  Estrogen  Receptor  (ER)  and  the  Progesterone 
Receptor  (PR).  Therefore,  SLs  1-9  were  used  in  an  array  format  to  determine  if  we  could  classify  five 
breast  cancer  cell  lines  into  these  molecular  subtypes  based  on  the  SL-glycan  interactions.  In  brief 
summary,  from  this  analysis  (based  on  SLs  1-9  and  evaluated  using  LDA),  we  were  able  to  distinguish  these 
four  subtypes  with  98%  accuracy.  Given  the  lack  of  known  markers  for  TNBC,  these  results  are  quite 
exciting!  Furthermore,  the  ability  of  our  SL  Array  to  distinguish  between  well  characterized  subtypes  of 
breast  cancer  suggests  that  it  may  be  of  use  in  this  capacity  for  other  types  of  cancers,  for  example  in 
determining  androgen  sensitivity  in  prostate  cancer. 


Table  3:  SL  ‘hits’  from  2BA  and  3BA  library  screens  using 
secreted  proteins  from  human  prostate  or  human  colon  cell 
lines,  or  membrane  extracts  from  human  prostate  tissue 
samples  (patient- matched  normal  and  adenocarcinoma  GS7), 
illustrating  amino  acid  composition  trends,  e.g.  charged 
amino  acids,  number  of  aromatic  rings  and  boronic  acids. 


Name 

DBA 

^L  Sequence 

fl(R) 

tl  Ph  rtn^A 

Pcriar  STTNQ 

library 

Against 

SLIS 

2 

3 

3 

2 

Colon  c«ll 

IteaKhy  cokxi 

2 

NH3.BSjO,ibJ*NLS(DabraS0RM 

1 

3 

Colon  cell 

heatthycokm 

SL  17 

2 

2 

2 

Prostate  tissue 

2 

rj  H  J *GS0«M 

z 

3 

Prostate  tissue 

w 

3 

4 

S 

0 

Colon  cell 

metastatic  colon 

SL  1$ 

2 

NR2-ftYtDab)*ftYF(Dab]  *  teBFtM 

3 

5 

0 

Prostate  tissue 

metastatic  prostate 
I2XAA} 

SL  20 

2 

NR2-RY(0ab)  ‘YWtOAbrEta&RM 

3 

& 

Q 

proslale  tissue 

metastatic  prostate 

(LAr-AA) 

2 

WH2  ftY(oab]^F  f  i(ojb)  *mmM 

3 

S 

D 

Prostate  cell 

metastatic  prostate 

30 


Key  Research  Accoi]:q)lishiiients 

•  Synthesized  peptoid  libraries  (PRT).  Peptoid  based  SL  libraries  (diversity  =  9^;  5.9  xlO"^  members)  were 
s}?nthesized  on  Tentagel  macro  beads  and  their  utility  for  identifying  SL’s  targeting  proof- of- concept 
glycoproteins  assessed.  The  library  was  also  used  to  turther  optimize  our  screening  procedures.  Screening 
with  this  hbrary  to  identify  selective  SL’s  is  ongoing.  We  are  also  moving  toward  the  synthesis  of  [3-amino 
acid  containing  libraries,  which  are  intrinsically  structured/pre- organized,  we  expect  to  turther  aid  the 
identification  of  SL’s  with  inproved  selectivity. 

•  Synthesized  peptide  libraries  (JJL  and  PRT).  Peptide  based  SL  libraries  (diversity  =  11^;  1.6  xlO^ 
members)  were  synthesized  on  Tentagel  macro  beads  and  also  used  to  turther  optimize  our  screening 
procedures  and  identify  several  new  selective  SLs  (see  below). 

•  Optimization  of  screening  protocols  (PRT).  The  above  libraries  were  used  to  identify  optimized 
conditions  for  identifying  SLs  that  selectively  bind  our  proof- of-concept  glycoproteins  and  CAGs.  These 
conditions  are:  10  mM  HEPES,  150  mM  Nad,  0.1%  E.  coli  fysate  (stock  cone.  8  mg/mL)  and  0.05% 
TWEEN. 

•  Developed  a  structure  activity  relationship  (JJL).  Used  SL2  and  SL5  to  develop  a  structure  activity 
relationship.  The  key  findings  were  that  positive  charge  and  the  boronic  acid  are  critical  for  alfinity  and 
selectivity.  This  information  is  being  fed  back  into  the  library  design  process  to  aid  in  the  generation  and 
subsequent  identification  of  highly  selective  SL’s  (see  Tasks  2c  and  3a). 

•  Identified  boroxole  as  a  high  affinity  sugar  binding  motif  (PRT).  The  2-formylphenyl  boronic  acid 
moiety  was  replaced  with  several  different  boronic  acids  to  explore  boronic  acid  substituent  effects,  and 
thereby  identify  the  factors  that  promote  the  selective  recognition  of  a  glycan  by  a  particular  SL.  The  key 
findings  were  that  the  substitution  pattern  did  not  matter  and  that  substituent  effects  (e.g.  electron 
donating/withdrawing  group)  were  minimaL  Also,  the  boroxole  moiety  was  identified  as  an  alternative 
moiety  with  improved  alfinity. 

•  Optimization  of  image  capture  and  analysis  (JJL).  A  Matlab  algorithm  was  successfiilly  developed  to 
automate  data  extraction  ifom  microscope  images  of  our  bead-based  assays.  The  algorithm  not  only 
identifies  each  bead  and  extracts  color  space  intensity  values,  but  also  allows  for  data  rejection  based  on 
customizable  threshold  values  for  size,  circularity  and/or  color  space  percentile  high  values  (ie.,  relating 
pixel  saturation).  Using  this  automated  data  collection  system,  additional  statistical  analyses  have  been 
performed  on  our  colon  cancer  data  sets,  and  using  quadratic  discriminant  analysis  and/or  support  vector 
machines,  our  classification  accuracies  improved  from  97%  to  >99%. 

•  Identified  4  additional  SLs  that  bind  proof-of-concept  glycoproteins  (JJL  and  PRT).  Screens  of 
peptide  libraries  containing  either  2-formylphenyl  boronic  acid  or  boroxole  identified  4  additional  SLs  that 
bind  proof-of-concept  glycoproteins. 

•  Identified  SLs  that  selectively  bind  sialyl  Lewis  X  over  Lewis  X,  sialyl  Lewis  A,  and  Lewis  A  (PRT). 

Screens  of  peptide  libraries  versus  biotinylated- sialyl  Lewis  X  identified  two  SLs  (SLexl  and  SLex2). 
Confirmation  assays  demonstrated  that  SLex2  selectively  binds  sialyl  Lewis  X  over  Lewis  X,  sialyl  Lewis 
A,  and  Lewis  A. 

•  Used  existing  SL  array  to  demonstrate  the  utility  in  diagnosing  and  staging  prostate,  breast,  and  colon 
cancer  (JJL).  Using  our  SL  array  to  classify  various  colon  cancer  cell  lines  according  to  metastatic 
potential,  we  achieved  97%  classification  accuracy  as  reported  in  our  Chem  Sci  manuscript.  Inclusion  of 
additional  colon,  breast  and  prostate  cancer  cell  lines  (n  =  10;  426  separate  measurements),  and  grouping  the 
different  cell  lines  according  to  whether  they  are  healthy,  cancerous  and  cancerous/metastatic  we  achieve 
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84%  classification  accuracy.  However,  if  we  look  at  it  from  a  diagnostic  perspective,  ie.  cancerous  versus 
non-cancerous,  the  classification  accuracy  improves  to  95%. 
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Developed  a  Dual  Dye  screening  protocol.  A  method  was  developed  to  screen  the  fixed-position-library 
with  mixtures  of  fluorescein  and  rhodamine  labeled  analytes.  In  one  case,  a  fluorescein  labeled  membrane 
extract  Ifom  the  RWPE-1  cell  line  (normal)  was  combined  with  a  rhodamine  labeled  membrane  extract  Ifom 
the  PCS  cell  line  (cancerous-metastatic),  and  this  mixture  was  incubated  with  our  SL  library.  Hits  were 
identified  as  beads  that  were  bright  in  one  channel  or  the  other  (ie.,  red  or  green)  but  not  in  both;  thereby 
aflbrding  cross-reactivity  between  prostate  and  cancerous  prostate  markers. 

Identified  5  additional  SLs  that  bind  prostate  cancer  related  glycoproteins.  Screens  of  the  fixed- 
position- library  with  FITC-PSA  and  labeled  membrane  extracts  ifom  RWPE-1  and  PCS  cell  lines  identified 
5  additional  SLs  that  bind  prostate  cancer  related  glycoproteins. 

Identified  SL  sequencii^  methods  based  on  Ldman  Degradation.  Previous  eflbrts  to  use  Edman 
degradation  methods  had  failed  due  to  randomized  boronic  acid  location  and  linker  length  (e.g.  Lys,  Om, 
Dab,  Dpr).  Partial  hydrolysis  of  the  peptide  backbone  was  observed  after  cleavage  of  the  boronic  acid 
moiety  with  peroxide,  further  conpHcating  the  Edman-based  analysis.  However  with  a  fixed  location  and 
only  one  linker  for  our  boronic  acid,  we  have  been  able  to  use  Edman  methods  to  sequence  our  SLs  with 
high  fidelity,  without  removing  the  boronic  acid  group. 

Optimized  image  analysis  protocol.  A  change  was  made  to  our  MATLAB  algorithm  to  improve  the 
identification  and  quantification  of  individual  assay  beads  ifom  weak  binding  between  an  SL  and  a  certain 
analyte  (ie.  Ifom  dark  images).  The  basic  challenge  was  how  to  accurately  find  the  edge  of  the  dark  bead 
compared  to  the  background.  In  the  new  MATLAB  algorithm  the  particles  are  found  using  the  color 
channel  with  the  greatest  amount  of  information  (e.g.  the  green  channel  for  fluorescein),  thereby  improving 
the  reliability  and  consistency  of  identifying  beads  with  intensities  as  low  as  5  on  an  8-bit  scale. 

Developed  a  better  understanding  of  the  importance  of  cross-reactivity.  Notably,  the  cross- reactive  SLs 
that  exhibit  modest  selectivity  in  our  proof- of- concept  paradigm  (ovalbumin,  bovine  mucin  and  porcine 
mucin)  consistently  provide  the  most  useful  information  when  assaying  cancer  related  sanples.  For 
example,  SL2  and  SL3  account  for  66%  of  the  variance,  or  discriminatory  ability  of  the  array,  when 
discriminating  six  prostate  derived  cell  lines;  yet  SL2  and  SLS  were  cross-reactive,  displaying  no  more  than 
a  2-fold  selectivity  for  any  of  the  proof- of- concept  glycoproteins. 

Developed  a  structure  activity  relationship.  Continuing  studies  to  evaluate  the  relationship  between  SL 
structure  and  glycoprotein  binding  affinity/selectivity,  and  thus  diagnostic  prospect,  have  highlighted  the 
importance  of  positive  charge  on  the  SL.  Specifically,  a  combination  of  boronic  acid  functionalized  SLs 
and  highly  positively  charged  SLs  lacking  boronic  acids  was  used  to  discriminate  colon  cancer  related  cell 
lines  with  great  effectiveness,  in  some  cases  better  than  when  only  SLs  containing  boronic  acids  was  used. 
This  information  is  being  fed  back  into  our  design  to  inprove  the  detection  and  staging  capabilities  of  our 
SL  Array  by  providing  additional  and  still  different  information  on  the  cell  line  in  general 
Advanced  SL  Array  design  and  utility  to  address  clinical  chaUenges.  Using  flow  cytometry  to  evaluate 
SLs  (10pm  beads)  we  have  demonstrated  the  utility  of  our  initial  SL  Array  to  mimic  results  obtained  using 
more  tedious  and  time-consuming  fluorescence  microscopy.  Using  Linear  Discriminant  Analysis  (LDA), 
classification  accuracies  were  determined  for  six  prostate  derived  cell  lines,  discriminating  healthy, 
cancerous  non- metastatic  and  cancerous  metastatic;  providing  81%  accuracy  (microscope  data  was  100%). 
However,  when  simply  comparing  samples  as  healthy  or  cancerous  using  flow  cytometry  our  SL  Array 
predicted  sample  class  with  97%  accuracy.  Additionally,  moving  towards  serum-based  testing,  we  have 
shown  that  samples  secreted  into  culture  media  show  the  same  response  to  our  SL  Array  as  those  from 
membrane  extracts.  Finally,  evaluation  of  human  tissue  samples  match  trends  observed  from  cell  lines. 

Used  SL  Array  to  demonstrate  diagnostic  and  stagii^  utUity  in  prostate,  breast,  and  colon  cancers. 
Using  LDA  to  interpret  the  results  from  an  expanded  SL  Array,  including  SLl-9,  we  evaluated  15  human 
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cell  lines  consisting  of  four  colon  derived  cell  lines  (HCT  116,  LoVo,  HT-29,  CCD  841  CoN),  five  breast 
derived  cell  lines  (MCFIOA,  MCF7,  MDA-MB-231,  BT474,  D47T)  and  six  prostate  derived  cell  lines 
(LNCaP,  DU145,  PCS,  RWPE-1,  WPE1-NA22,  WPE1-NB14),  obtaining  76%  classification  accuracy  based 
solely  on  cancerous  or  normal  However,  when  we  included  the  tissue  source  (breast,  colon  or  prostate)  into 
our  analysis  the  overall  classification  accuracy  improved,  depending  on  the  tissue  type,  to  between  84%  and 
90%. 

•  Used  a  Dual  Dye  screening  protocol  with  prostate  cell  lines.  Screens  of  the  fixed-position-library  with 
labeled  secreted  glycoproteins  from  RWPE-1  and  DU145  cell  lines  identified  2  additional  SLs  that  bind 
prostate  cancer  related  glycoproteins. 

•  Used  a  Dual  Dye  screening  protocol  with  colon  ceU  lines.  Screens  of  the  fixed-position- library  with 
labeled  secreted  glycoproteins  from  CCD  841  CoN  and  LOVO  cell  lines  identified  3  additional  SLs  that 
bind  colon  cancer  related  glycoproteins. 
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Reportable  Outcomes 

•  Published  a  manuscript  in  Chemical  Sciences^  (see  Appendices)  detailing  the  utility  of  SL  arrays  to 
discriminate  cancer  cell  lines  based  on  metastatic  potential,  thereby  setting  the  stage  for  further  developing 
this  approach  for  the  diagnosis  and  staging  of  cancer. 

•  Kevin  Bicker,  who  played  a  key  role  in  developing  the  SL  array,  will  begin  his  tenure  track  faculty  position 
at  Middle  Tennessee  State  University  in  August  2013. 

•  Lavigne  presented  a  seminar  to  the  College  of  Pharmacy  at  the  Medical  University  of  South  Carolina. 

•  Held  joint  lab  meeting  at  The  Scripps  Research  Institute,  Scripps  Florida,  on  July  25,  2013.  Anna 
Veldkamp,  Kathleen  O’Connell,  and  Daniel  Lewallen  presented  seminars  on  their  SL  studies. 

•  Jing  Sun,  who  played  a  key  role  in  developing  the  SL  array,  began  her  full-time  Instructor  position  at 
Georgia  Southern  University  in  August  2013. 

•  Lavigne  presented  an  invited  seminar  at  the  Southeast  Regional  Meeting  of  the  American  Chemical  Society 
in  Atlanta,  GA  in  November  2013. 

•  Lavigne  presented  a  seminar  in  the  Department  of  Chemical  Engineering  at  Texas  A&M  University  in 
College  Station,  TX  in  May  2013. 

•  Lavigne  spent  one  week  as  a  Visiting  Scientist  in  the  Department  of  Chemical  Engineering  at  Texas  A&M 
University  in  College  Station,  TX  in  May  2013. 

•  Lavigne  and  O’Connell  (post-doctoral  fellow)  participated  in  the  Space,  Cancer  and  Personalized  Medicine 
Conference  at  the  Gibbs  Cancer  Center  and  Research  Institute  in  Spartanburg,  SC  in  May  2014. 

•  Lavigne  presented  a  seminar  for  the  South  Carolina  Cancer  Prevention  and  Control  Program 

•  Lavigne  presented  a  seminar  for  the  Center  for  Colon  Cancer  Research. 

•  Lavigne  presented  a  poster  at  theU*  Annual  MUSC/GRU/USC  Joint  Cancer  Retreat. 

•  Lavigne  presented  an  invited  seminar  at  the  250'*’  National  Meeting  of  the  American  Chemical  Society, 
Division  of  Organic  Chemistry,  Teva  Pharmaceuticals  Scholars  Grant  Synposium. 

•  Lavigne  presented  an  invited  seminar  at  Pacifichem  2015. 

•  Lavigne  presented  an  invited  seminar  at  the  XXVIII  International  Carbohydrate  S}mposium  (ICS). 

•  Erin  E.  Gatrone  successfully  defended  her  Dissertation  entitled  ‘Using  S}nthetic  Lectins  to  Investigate 
Metastatic  Potential  in  Colon  Cancer” 

•  Anna  A  Veldkairp  successfully  defended  her  Dissertation  entitled  ‘‘Assessing  Aberrant  Glycosylation  with 
Synthetic  Lectins  to  Detect  and  Stage  Prostate  Cancer” 
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Conclusions 

Significant  progress  has  been  made  on  this  project  to  develop  synthetic  lectin  (SL)  arrays  that  bind  to  prostate 
cancer  associated  glycans  and  glycoproteins  (CAGs)  to  detect  glycosylation  patterns  associated  with  cancer. 
These  studies  are  being  pursued  to  develop  this  methodology  into  a  robust  system,  thereby  providing  a  new 
paradigm  that  can  diagnose  and  stage  prostate  cancer.  Moreover,  these  studies  directfy  relate  to  the  “Imaging,” 
and  “Biomarker”  focus  areas  of  the  PCRP  overarching  challenges.  In  particular,  the  progress  made  towards 
creating  a  cross-reactive  sensor  platform  will  allow  for  more  reliable  diagnosis  of  prostate  cancer  and  thus 
inprove  the  likelihood  of  accurate  detection  and  aid  in  managing  prostate  cancer,  thereby  decreasing  many  of 
the  negative  impacts  associated  with  prostate  cancer. 

Peptide  and  peptoid  libraries  have  been  synthesized  and  screened  against  cancer  associated  analytes. 
Consequently,  six  new  synthetic  lectins  have  been  identified  targeting  both  glycans  (2  new  SLs)  and 
glycoproteins  (4  new  SLs).  In  so  doing,  we  have  been  able  to  inprove  our  methods  for  binding  SLs  to  CAGs  to 
reduce  background  binding,  thereby  inproving  our  signal  to  noise  ratio.  We  have  also  been  able  to  advance  our 
approaches  to  1)  acquire  assay  images,  2)  extract  assay  response  values  and  3)  analyze  the  assay  outcome. 
Ultimately,  these  inprovements  have  allowed  us  to  verify  the  validity  of  our  approach  while  also  inproving  the 
overall  assay  accuracy.  As  such,  we  have  enlarged  our  data  set  to  nearly  12,000  measurements  while  expanding 
the  assay  relevance  and  at  the  same  time  maintaining  classification  accuracies  between  93-97%.  These  results 
reflect  assay  responses  to  a  combination  of  prostate,  colon  and  breast  cancer  cell  lines. 

In  addition  to  enhancing  the  overall  assay  performance,  we  have  also  advanced  our  understanding  of  what 
factors  are  important  for  SLs  to  bind  CAGs.  Specifically,  we  have  demonstrated  that  boroxoles  are  eflicient 
replacements  for  the  originally  proposed  boronic  acids  and  can  improve  the  binding  aflinity  of  SLs  for  certain 
CAGs.  We  have  also  begun  to  develop  a  detailed  structure- activity- relationship  that  has  to  date  indicated  that 
charge  on  the  SL  is  important  for  defining  binding  aflinity  with  CAGs  while  the  boronic  acids  significantly 
contribute  to  binding  selectivity. 

New  protocols  have  been  developed  to  screen  our  SL  libraries  based  on  conpetitive  binding  between 
differently  labeled  glycoproteins  or  cell  membrane  extracts.  Subsequently,  five  hits  have  been  isolated  and 
sequenced  that  unambiguously  target  prostate  cancer  associated  glycoproteins.  While  developing  these  novel 
screening  methods  we  have  also  been  able  to  inprove  our  sequencing  and  image  analysis  techniques.  In 
particular,  we  can  now  effectively  use  Edman  degradation  schemes  to  sequence  our  SLs  without  having  to 
remove  our  boronic  acid  groups.  Regarding  image  analysis,  we  are  now  better  able  to  identify  our  particle 
edges  from  fluorescence  microscopy  while  also  transitioning  to  flow  cytometry  based  n^thods  to  afford  higher 
throughput  and  more  consistent  data  acquisition. 

In  evaluating  how  SL  structure  impacts  binding  aflinity  and  selectivity  with  glycoproteins,  we  have  gained  a 
better  understanding  of  cross-reactive  SLs  contribute  to  the  overall  SL  Array  effectiveness  while  also  learning  a 
great  deal  about  how  the  charge  and  boronic  acid  group  of  our  SLs  inpact  binding  and  ultimately  influence  the 
utility  of  our  SL  Array  for  diagnosing  and  staging  prostate  cancer.  As  such,  we  have  been  able  to  expand  the 
scope  of  our  SL  Array  analysis  to  include  breast,  colon  and  prostate  derived  cell  lines.  We  have  also  utilized 
sanples  obtained  from  cell- culture  media  that  include  secreted  glycoproteins  towards  evaluating  patient  serum. 
Similarly,  we  have  demonstrated  that  our  SL  Array  can  effectively  discriminate  human  colon  tissue  sanples. 

As  this  project  progresses,  we  will  continue  to  expand  our  understanding  of  the  factors  important  for  SLs  to 
bind  CAGs.  Specifically,  we  will  study  in  greater  detail  how  our  SLs  are  interacting  with  clinically  relevant 
targets.  Ultimately,  while  initial  studies  have  provided  valuable  insight  into  what  factors  contribute  to  SL- 
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Glycan  binding,  it  is  clear  that  what  makes  a  good  binder  for  PSM  is  not  necessarily  what  is  needed  to  make  a 
good  sensor  for  detecting  prostate  cancer,  for  example.  In  regards  to  screening  the  SL  library  and  using  these 
SLs  in  discerning  normal  and  cancerous  cell  lines,  we  have  importantly  learned:  1)  that  we  cannot  take  any  SL 
for  granted,  and  2)  identifying  SLs  from  more  biologically  relevant  samples  could  provide  better  classification 
and  more  detailed  information  regarding  the  particular  glycosylation  patterns  associated  with  a  particular 
disease  state.  As  novel  and  exciting  approaches,  we  will  endeavor  to  incorporate  fluorescent  boronic  acids  into 
our  SL  design  thereby  eliminating  the  need  to  label  our  samples  because  the  boronic  acids  change  intensity 
upon  diol  binding.  Furthermore,  we  plan  to  evaluate  using  our  SLs  to  capture  or  stain  glycoproteins  as  part  of  a 
spot  array.  Significantly,  we  are  continually  screening  our  libraries  for  new  hits  that  better  target  prostate 
cancer  and  subsequently  these  hits  are  included  into  our  array  and  used  to  better  discriminate  prostate  cancer 
cell  lines  while  simultaneously  improving  our  signaling  strategies,  our  data  analysis  and  the  overall  utility  of 
our  approach. 

Despite  being  located  at  two  different  sites  JJL  at  USC  and  PRT  moving  from  TSRI  to  the  University  of 
Massachusetts  Medical  School,  the  project  has  continued  to  grow  and  evolve  through  constant  email  and  phone 
contact,  as  well  as  organized  weekly  meetings  and  scheduled  site  visits.  As  revealed  above,  each  PI  has 
contributed  to  different  aspects  of  this  project;  with  both  Pis  having  overlapping  and  supporting  roles  for  the 
other.  Clearly,  this  team  works  well  together,  providing  their  own  expertise  to  result  in  a  level  of  productivity 
that  is  greater  than  that  achievable  by  each  PI  working  independently.  Certainly,  this  project  would  not  exist 
without  the  input  of  both  Pis. 
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